Application of Open-Source, Low-Code Machine-Learning Library in Python to Diagnose Parkinson's Disease Using Voice Signal Features

Silva, Daniel Hilário da; Ribeiro, Caio Tonus; Souza, Leandro Rodrigues da Silva; Pereira, Adriano Alves

doi:10.1590/1678-4324-2025230860

Abstract

Parkinson's disease (PD), the second most prevalent neurodegenerative disorder after Alzheimer's disease, affects approximately 10 million individuals worldwide. The disease is characterized by both motor and non-motor symptoms, and clinical aspects are pivotal for diagnosis. Vocal abnormalities can be identified in about 90% of PD patients in the early stages of the condition. Machine Learning (ML), a prominent subfield of Artificial Intelligence (AI), holds significant promise in the medical domain, particularly for early disease detection, enabling effective preventive measures and treatments. In this paper, we considered the unique characteristics of each ML algorithm. Seventeen ML algorithms were applied to a dataset of voice recordings from Healthy Control and PD individuals, sourced from a publicly available repository. We leveraged the PyCaret Python library's ML algorithms and functions, which were introduced in this article, to demonstrate their simplicity and effectiveness in dealing with real-world data. Among these algorithms, Extra Trees Classifier (ETC), Gradient Boosting Classifier (GBC), and K Neighbors Classifier (KNN) exhibited the best performance for the given dataset. Furthermore, to enhance the models' performance, we employed various techniques, including Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance, feature selection based on correlation, and hyperparameter tuning. Our findings highlight the potential of the PyCaret ML library demonstrated in this article as a valuable tool for applying ML to the classification of Parkinson's disease through voice analysis. The application of ML in this context can greatly support clinical decision-making, leading to more informed and precise interventions.

Keywords:
machine learning; medical diagnosis; Parkinson’s Disease; PyCaret; voice signal.

HIGHLIGHTS

Numerous classifiers were evaluated to for classify Parkinson disease in voice recordings.

Low-code machine-learning library in Python to simplify the analysis process.

The Synthetic Minority Oversampling Technique and Feature Selection method were used.

Best results achieved using hyperparameter tuning.

INTRODUCTION

Parkinson’s disease (PD) is a neurological disorder that affects approximately 10 million people worldwide [¹], with global occurrence doubling in the past 25 years due to increased longevity and longer duration of disease [²,³], being the second most common neurodegenerative disease after Alzheimer’s disease [⁴,⁵].

Motor and non-motor symptoms are present in the clinical presentation, and clinical aspects are used to make the diagnosis [⁶]. Early disease identification is essential for planning efficient preventive measures and treatments, and the prediction of chronic diseases is of the greatest importance in healthcare informatics [⁷]. Although a variety of pharmacological therapies are utilized to reduce the problems brought on by the condition, Parkinson's disease is often treated by invasive procedures, such as Deep Brain Stimulation (DBS) or ablation [⁸]. Multiple research projects have demonstrated the connection between speech problems and Parkinson's disease [¹].

Noninvasive speech tests have been explored as a new marker for PD since speech deterioration, classified as one of the motor symptoms of the condition, is consistently observed in approximately 90% of PD patients presenting with some vocal disorder in the initial stages [⁴,⁹-¹¹]. Voice recordings have been considered as a potential (non-invasive and low cost) biomarker to diagnose some voice related diseases [¹²-¹⁵], and in a previous study the authors showed that PD patients were identified using acoustic features extracted from a speech test and the voice is an early biomarker for PD detection [¹¹,¹³,¹⁶].

The Artificial Intelligence (AI) is an area of computer science that aims to develop systems that simulate the human ability to perceive a problem [¹⁷] and the Machine Learning (ML) is a subfield of AI that studies the ability of computers to learn from previous experiences [¹⁸]. According to Hastie and coauthors (2019) [¹⁸] AI is the capability for machines to imitate intelligent human behavior, while ML is an application that allows computer systems to automatically learn from experience without explicit programming [¹⁹]. With advances in technology and computational capabilities, ML has emerged as a valuable tool for the early prediction of diseases [¹⁶].

For the application and use of ML, there are several packages, libraries, and toolboxes for the Python language and the PyCaret that is an open-source, low-code machine learning library that automates machine learning processes that provides by object-oriented language, and extensible tools for data preparation, classification, regression, clustering, anomaly detection, time series and some examples of data sets [²⁰].

This library is an alternative low-code library that can be used to replace hundreds of lines of code with just a few lines of code. In essence, PyCaret is a Python wrapper for a number of machine learning frameworks and libraries [²⁰,²¹]. The PyCaret Python library will be introduced in this article, and it will be used to apply a PD dataset to demonstrate the simplicity and efficacy of this library utilizing data from real issues. Additionally, it explains how it can be applied to the classification of Parkinson's disease through voice analysis, and as a supportive tool for its diagnosis.

We’ll describe the public repository from which the data were selected, how the acoustic variables were recorded, the information about the variables collected, data preparation, and the functions used for statistical analysis, creation, analysis, and optimization of ML models. Additionally, we will describe in detail the results for the ML models applied after feature selection (FS) and the model optimization, presenting the discussion and conclusions about the results afterward.

The primary aim of our study is to assess the effectiveness of low-code tools in diagnosing Parkinson's disease through the analysis of voice recordings. Therefore, the inclusion of the patient’s Hoehn & Yahr scale stage data introduces a challenge when directly comparing our results with recent studies that make use of machine learning models on the same dataset, as demonstrated in [⁹] and [²²], and for this reason, was considered only the original dataset as presented in [²³]. The scientific and academic relevance of this study refers to the utilization of a readily applicable and replicable low-code library associated with the comparison of the approaches from 17 ML models for the detection of PD using a voice dataset, as well as the dissemination of information concerning Parkinson's disease, the second most prevalent neurodegenerative condition globally. The social relevance involves the use of data acquired through a non-invasive methodology for aiding as a tool used in the diagnosis of Parkinson's disease.

MATERIAL AND METHODS

In this section, information on participants, speech recordings, data preparation, feature extraction, cross-validation methods and functions for the model analysis are presented.

Participants

The dataset used in the experiments consists of features obtained from the “Oxford Parkinson’s Disease Detection Dataset with Acoustic Features”, which was donated to the UCI Machine Learning Repository by [²³] in 2019 and it is available online [²⁴]. According to the authors the data for this study consists of 195 rows representing samples and columns which represent a particular voice measure from 31 male and female subjects, of which 23 were diagnosed with PD and 8 were from the Healthy Controls (HC) as shown in Table 1. The ages of the subjects ranged from 46 to 85 years and the time since diagnoses ranged from 0 to 28 years. The mean age (± standard deviation) of the individuals was 65.80 ± 9.80. It can be observed from [²³] that the number of patients in the lower stages of the Hoehn & Yahr scale less than or equal 2.0, represents 47.83% of the subjects and the number in advanced stages Hoehn & Yahr more than or equal than 2.5 represents 52.17% of the subjects with PD. The mean and standard deviation of years since the diagnosis of the patients is 2.22 ± 0.73, and more details about the stage of each subject can be observed in [²³].

Thumbnail

Table 1
Distribution of the participants of the study.

In the dataset, 195 records were observed, considering only 31 subjects with 24 variables in total, 22 variables with data over the voice, and two variables with data over the code and status of the individuals. Each row of the dataset represents one voice recording for 36 seconds, with an average of six recordings produced for each patient. Six recordings were made for 22 patients, while seven recordings were made for nine patients [²²,²³].

The phonations were recorded in an IAC sound-treated booth using a head-mounted microphone (AKG C420) positioned at 8 cm from the lips. The voice signals were recorded directly to the computer using CSL 4300B hardware (Kay Elemetrics), sampled at 44.1 kHz, with 16 bits resolution [²³]. Table 2 shows 23 features (22 acoustic features and 1 that represents the status) used in this study and their descriptions.

Thumbnail

Table 2
Variables used in the experiments.

The labels for the cases are 0 for healthy and 1 for PD patients, respectively. The attribute details of the data set are compiled in Table 2. Figure 1 shows the distribution of 'spread2' and 'MDVP:RAP' from our PD dataset, integral to the descriptive analysis phase. The presentation of histograms and KDE together with the Q-Q plots provides a dual perspective on distribution traits. 'Spread2' aligns closely with the expected normal pattern, whereas 'MDVP:RAP' displays significant variance, indicating a non-normal distribution. These findings guide our data preprocessing decisions and highlight the diversity of acoustic feature behavior.

Figure 1
Distributions for the features (a) Histogram with KDE for 'spread2', b) Q-Q plot for 'spread2', (c) Histogram with KDE for 'MDVP:RAP' and (d) Q-Q plot for 'MDVP:RAP' to evaluate data normality.

Figure 2, present the scatter plots exploring the pairwise relationships between several acoustic features. Each point on the plots corresponds to an individual sample, differentiated by color to distinguish healthy individuals from PD patients. The side-by-side arrangement of 'PPE' with 'RPDE', 'spread1', and 'HNR', as well as 'MDVP:RAP' with 'MDVP:PPQ', offers insights into the potential correlations and distinctive distribution patterns.

Figure 2
Scatter plots of (a) 'PPE' versus 'RPDE', (b) 'PPE' versus 'spread1', (c) 'MDVP:RAP' versus 'MDVP:PPQ' and (d) 'PPE' versus 'HNR', categorized by the status.

The proposed method is designed to classify whether the patient has PD or not by using the Python language and PyCaret Library. The methodology of the proposed model is structured into the following steps: data preprocessing, setup, model training, analyzing the performance, tuning the hyperparameters and deploy. These steps are structured on the framework for the classification modeling show in Figure 3.

Figure 3
Framework for machine learning modeling for PD classification.

Framework for classification modeling

For supervised ML modeling, a total of 17 algorithms widely reported in the literature for PD [⁴,⁵,⁷,⁹,¹²,²²] classification and available in the PyCaret library [²⁰] were adopted in a comprehensive approach. Figure 3 delineates the adopted workflow for classification modeling through supervised machine learning within the Parkinson's disease diagnosis framework. The dataset, comprising acoustic information from 31 subjects, amounting to 195 entries, is initially subjected to preprocessing using the Pandas library to ready the data for application via PyCaret's setup function. Subsequently, an Exploratory Data Analysis is undertaken, pivotal for discerning the data's characteristics and patterns. The setup function is deployed to prepare the data for modeling, incorporating feature selection techniques, class balance via SMOTE, K-fold for cross-validation, and z-score normalization, with a focus on training 70% of the data. A multicollinearity threshold is set to ensure the explanatory variables' independence. Following this procedure, 14 acoustic features are selected out of a total of 265 records.

Training encompasses 17 distinct ML algorithms, chosen based on their prominence in scholarly literature and availability within the PyCaret library. The models' performance is evaluated and juxtaposed, facilitating the meticulous selection of the optimal algorithms. Model performance is rigorously scrutinized in both pre and post-hyperparameter tuning. The top three models are then employed for predictions.

The proposed ML framework employs initially 22 acoustic features as predictor variables which are standardized (zero mean, unit variance) and the disease status as response variables.

Importing the data and the preprocessing with the setup function

The Python programming language and functions of Pandas library that is an excellent tool for data manipulation, were employed in the data preparation to simplify the process together with the setup function of PyCaret library. The dataset was imported as a CSV file and had 147 records for PD and 48 for HC, as shown by the "status" column, which we utilized to identify the imbalance of the data.

Specifically, the setup function was employed to prepare the data for model training. This function offers a range of parameters, allowing the creation of a comprehensive data preprocessing pipeline. When the function is executed, a table of valuable information is generated, providing detailed explanations of its settings and parameters [²⁰,²¹].

Our dataset exhibits an imbalance, with fewer samples in the HC class compared to the PD class. To address this class imbalance, we employed the Synthetic Minority Oversampling Technique (SMOTE) [²⁰,²²], on the setup function, a method for improving minority data based on previous samples. SMOTE builds a linear connection using the features and is utilized to tackle the issue of imbalanced data by generating new samples in the minority group and their neighboring samples randomly [²⁵]. This approach effectively balances the dataset and enhances the performance of the classification model [²⁶].

The data normalization technique called z-score was employed, to ensure that all features were given equal weight during the model learning phase [⁵], as the dataset comprises multiple features with different scales, and this step ensures that all features contribute equally to the learning process [¹⁶].

Another crucial hyperparameter considered was the division of the data into two sets: the training data and the testing data, with a proportion of 70% and 30%, respectively. This division was implemented to prevent underfitting and overfitting, as explained in [¹,¹⁶,²²], and like the approach observed in [⁵].

The training set was utilized to train the models and assess their performance using subject-wise cross-validation (CV) of 10-fold to find the appropriate hyperparameters [⁴,⁵,¹⁶]. In contrast, the testing dataset was used to evaluate the model's ability to generalize on unseen data. This division of data aids in validating the model's effectiveness and ensuring its robustness in handling new and unseen data instances. The algorithm's performance was presented in terms of the average metric performance. This procedure needs a certain computational resource allocation, considering the time required to discern the optimal hyperparameters. Nevertheless, it proves crucial, contributing to the acquisition of precise results and mitigating overfitting, especially in scenarios where the dataset size is limited [²⁷] as observed in our study.

The parameter to remove the multicollinearity was applied to select highly correlated voice features, using a multicollinearity threshold equal to 0.96 (r = 0.96), similar to [⁵], which reduced the original subset of 22 acoustic features to 14, emphasizing the extraction of pertinent information.

Figure 4 illustrates the feature selection process, wherein features are selected from the original set, resulting in a subset after applying the Correlation Feature Selection (CFS) technique. This method is employed to select the most relevant or significant features from a dataset based on their correlation with the target variable, as observed in [⁵,²⁸].

Figure 4
Feature selection process applied to voice signals dataset.

Classification models

Based on the research reported in the literature [²,⁴,⁵,⁹,¹⁶] a broad supervised machine learning modeling method was applied, utilizing a framework presented in Figure 3, seventeen algorithms to categorize PD patients, all of which are accessible in the PyCaret library [²⁰]. In the beginning, the following algorithms were used to classify the training data: Ada Boost Classifier (ADA), CatBoost Classifier (CATBOOST), Decision Tree Classifier (DTC), Dummy Classifier (DUMMY), Extra Trees Classifier (ETC), Extreme Gradient Boosting (XGBOOST), Gradient Boosting Classifier (GBC), K Neighbors Classifier (KNN), Light Gradient Boosting Machine (LGHTGBM), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Naïve Bayes (NB), Quadratic Discriminant Analysis (QDA), Random Forest Classifier (RF), Ridge Classifier (RIDGE), Support Vector Machine - Linear Kernel (SVM) and Support Vector Machine - Radial Kernel (RBFSVM).

EDA function

Every machine learning project must include exploratory data analysis (EDA), which uses visuals to aid in our understanding of a dataset's basic statistical features [²⁰]. The EDA function generates an automated analysis using the AutoViz library. As opposed to needing to write numerous lines of code for each chart, AutoViz simplifies the visualization process by requiring just one line of code to generate a variety of insightful plots. This library is made to operate with datasets of any size, automatically choosing the data when necessary to ensure that the visualizations are produced quickly and effectively without sacrificing the insights [²⁰].

The function compare_models, create_model and tune_model

The data configuration and model performance evaluation were accomplished using the setup and create_model functions, respectively, from the PyCaret library in Python. The setup function conducts cross-validation based on a pre-defined number of partitions [⁴], [⁵], [¹¹], [¹³], and during the validation process, it provides a performance metric for each test partition. The create_model function is employed to train a specific model using the parameters specified in the setup function. This process ensures that the model is trained with the relevant data configurations established during the setup phase [²¹].

On the other hand, the compare_models function streamlines the process of selecting the best classification model. It achieves this by training all available models and presenting the results in a user-friendly format [²¹]. The output includes the model names in the first column, followed by various performance metrics. The models are sorted based on our preferred metric, which, in this case, is accuracy. This systematic comparison enables us to identify the most suitable model for our specific classification task efficiently [²⁰].

The tune_model function is instrumental in enhancing the performance of a classification model through hyperparameter tuning [²¹], a technique pointed at finding the optimal combination of its hyperparameters. As highlighted by Ali (2020) [²⁰], this function tunes the hyperparameters of a designated estimator. Its output is a score grid, displaying CV scores by fold for the best selected model based on the optimized parameter. By default, the function utilizes the randomized search method from the scikit-learn library, although other alternative methods are available.

Analysis and model explanation functions

The mean and standard deviation of the performance metric were used to report algorithm performance. In medical diagnostic systems, many measures, including accuracy, the area under the ROC curve (AUC), recall, precision, F1-score, and Matthews [⁷], are used to assess the efficacy of classification techniques [⁴,⁷,¹⁶,²²]. A common performance indicator for models in the literature is accuracy [⁴,⁵,⁹,¹⁶,²⁹]. Since it can be determined based on the values of the confusion matrix, the process of computing is quite simple given that actual class instances are represented as rows and predicted class occurrences are represented as columns.

From the values obtained through the confusion matrix, it is possible to derive several metrics such as accuracy, AUC, recall, precision, F1-score and MCC. The four possible outcomes are: True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN) [⁹,²²,²⁶]. Using the four outcomes of the confusion matrix we have the expression for Accuracy which is used to determine the classifier’s effectiveness, given by Equation 1; Precision, used to identify the percentage of cases identified as positive by the classifier given by Equation 2; Recall, which represents the percentage of the accurately classified positive samples with PD, given by Equation 3; F1-Score, a metric that combines precision and recall in a single value, given by Equation 4; and Matthews Correlation Coefficient, which is a good measure for data with values where the difference in samples in each class differs is given by the Equation 5 [²⁸,³⁰].

(1)

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(2)

Precision = \frac{TP}{TP + FP}

(3)

Recall = \frac{TP}{TP + FN}

(4)

F 1 - Score = 2 * \frac{(P r e c i s i o n * R e c a l l)}{(P r e c i s i o n + R e c a l l)}

(5)

MCC = \frac{TN*TP - FN*FP}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

For an in-depth understanding and comprehensive discussion of the mathematical expressions employed for calculating the metrics presented in this investigation as defined in Equations 1 through 5, interested readers are directed to the works [⁷,²⁵,²⁶] which are relatively simple to be calculated.

The plot_model function allows for the creation of various informative graphs for classification models. In this paper, we utilized this function to generate plots for the top three classification models [²⁰,²¹]. These visualizations offer valuable insights into the performance and characteristics of the models, enabling a comprehensive analysis and comparison of their effectiveness in handling the given dataset.

Area under ROC curve

The Receiver Operating Characteristic (ROC) curve is formed by plotting the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis [³¹]. It provides a comprehensive summary of a classification model's prediction performance across all thresholds, enabling a global evaluation of the diagnostic test's value. The area under the ROC curve (AUC) is calculated to quantify the overall performance of the classification model, offering a measure of its discriminatory power in distinguishing between different classes. The AUC value is particularly useful for assessing the model's accuracy and effectiveness in a binary classification task, where it indicates the probability that the model correctly ranks a randomly chosen positive instance higher than a randomly chosen negative instance [²⁹-³²].

Performance and predictions of the model

On the test dataset, which was created during the initialization of the PyCaret library, the predict_model function produces predictions. This enables us to assess the model's precision using data that wasn't used for model training. The function additionally returns a dataframe with the dataset columns and the predicted class for each instance [²¹].

In conclusion, the model's performance on the test dataset is evaluated by monitoring the predictions made by the model using the predict_model function. It is a useful tool for determining how well the model generalizes to new data and helps with the assessment of the model's performance in practical situations [²⁰].

RESULTS

Python programming language version 3.10.12, Visual Studio Code software version 1.80.1 from June 2023 and the low-code machine learning library in Python Pycaret 3.0.4 [²⁰] were used to do the data pre-processing, optimization of the ML algorithms, and analyze the results.

Classification algorithms

The function used to configurate the dataset was the setup, being that for each model was considered the following parameters given by Figure 5.

Figure 5
Parameters of the setup function

The selection of parameters in the setup function, as shown in Fig. 5, was driven by the need to ensure proper data preprocessing and mitigate common ML challenges such as multicollinearity and data imbalance. Key parameters, including 'normalize = True' and 'remove_multicollinearity = True', were selected to standardize the data and automatically remove highly correlated variables, thereby enhancing model performance and reducing the risk of overfitting. These considerations, combined with cross-validation and dataset balancing through SMOTE, provide a robust framework for the development of predictive models, as detailed in our exploratory data analysis.

The other parameters from the setup function were considered as default. After this preparation given by the setup function, it’s possible to use the EDA function to provide an automated analysis of the dataset with a single command, which is essential for comprehending the data's behavior before applying the create_model function. The outcomes generated by this function are depicted in Figure 6, presenting a histogram and a Kernel Density Estimation (KDE) plot of the class distribution. These visualizations serve the purpose of identifying that the data have become balanced after the utilization of the SMOTE parameter in the setup function. The histogram and KDE provide a clear representation of the improved class distribution, reflecting the successful mitigation of the initial data imbalance through the application of SMOTE. As previously mentioned in this paper, the dataset originally consisted of 147 records for PD and 48 for HC. However, observing Figure 6, it becomes evident that the balanced dataset contributes to more accurate and reliable model training, ensuring fair and unbiased classification outcomes.

Figure 6
Histogram and KDE of the target (variable status): (a) histogram of the status; (b) KDE of the status

The application of SMOTE effectively balanced the class distribution, addressing the initial data imbalance and creating a more equitable representation of both classes.

The Figure 7 depicts the KDE distribution plots generated by the EDA function, presenting a specific numerical variable divided into classes 0 and 1. To highlight the ease of generating these graphics, our specific focus was on the variable "spread1" before (a) and after (b) applying the SMOTE parameter in the setup function.

Figure 7
KDE distribution plots of the variable spread1 by classes: (a) before the utilization of the SMOTE parameter; (b) after the utilization of the SMOTE parameter in the setup function.

The results of the 17 algorithms applied to the training data after implementing the models with default parameters using the compare_models function of the PyCaret library are shown in Table 3. This evaluation adopts a broad spectrum of metrics, Accuracy, Recall, Precision, F1-Score, AUC, and MCC to provide a robust assessment of model performance. Specifically, the AUC is measured on a scale from 0 to 1, with 1 indicating perfect model discrimination, while MCC values range from -1 to 1, reflecting the quality of binary classifications from perfect agreement to complete discord.

Thumbnail

Table 3
Performance analysis of 17 models applied to training data.

The create_model function conducts cross-validation [⁴], [⁵], [¹¹], [¹³] based on a pre-defined number of partitions set in the setup function. After the validation process, it provides a performance metric for each test partition. This function was employed to create models with the best results for the algorithms listed in Table 3. The graphs presented in Figure 8, Figure 9 and Figure 10 were generated using a single line of code, utilizing the plot_model function from the PyCaret library. This powerful function allows for effortless creation of informative visualizations, providing valuable insights into the model's behavior, performance, and predictive capabilities.

Figure 8
Confusion matrix by the tuned model: (a) KNN, (b) ETC and (c) GBC for the testing data

Figure 9
ROC Curve for KNN model after the optimization process by F1-Score metric applied to the testing data.

Figure 10
Feature importance plot: (a) ETC model; (B) GBC model.

Considering the values presented in Figure 8 for the tuned KNN, ETC and GBC model over the testing data with the expressions for Accuracy, Recall, Precision, F1-Score, and MCC defined in [⁷,⁹,³²], it was possible to calculate these values for the top three models.

Utilizing the numerical results provided in Figure 8, and applying the mathematical expressions denoted in Equations 1 through 5 for accuracy, recall, precision, F1-score, and MCC respectively, we derived the corresponding metric values for the optimized GBC model when evaluated against the testing dataset.

A c c = \frac{43 + 12}{43 + 12 + 1 + 3} = 93.22 %

R e c a l l = \frac{43}{43 + 1} = 97.73 %

P r e c i s i o n = \frac{43}{43 + 3} = 93.48 %

F 1 - S c o r e = 2 * \frac{0.9348 * 0.9773}{(0.9348 + 0.9773)} = 95.56 %

M C C = \frac{12 * 433 - 1 * 3}{\sqrt{(43 + 3) (43 + 1) (12 + 3) (12 + 1)}} = 0.8166

For all the other metrics and models presented in Table 4, the calculations are based on the values from each confusion matrix depicted in Figure 8. The formulas utilized for these calculations and more details are available in [⁷,²⁶,³⁰,³³]. These metrics provide additional insights into the model's performance, complementing the evaluation provided by the confusion matrix.

Thumbnail

Table 4
Performance analysis, by accuracy, for the top three models applied to training and testing data after the optimization by F1-Score

As previously stated in this article, higher AUC values indicate superior classification performance, with a value of 1.0 representing a perfect classifier and a value of 0.5 suggesting a classifier that performs no better than random chance. In Figure 9, we specifically present the ROC curve for the KNN model, which achieved an impressive AUC value of 0.99. This demonstration highlights the ease with which the ROC curve can be generated using the plot_model function.

The feature selection process began with the initial set of 22 acoustic features, aiming to develop a concise and effective model. To fine-tune this process, adjustments were made to the remove_multicollinearity parameter within the CFS technique. This parameter modification enhanced the selection process, leading to a reduction in the number of acoustic features from 22 to 14. Subsequently, these refined features were evaluated across 17 distinct classification methods. The Feature Importance plot provides a score for each feature, with higher scores indicating greater relevance to the output data variable [⁵,⁷]. Notably, despite considering only 14 features after applying the CFS technique, the feature importance graph highlights the top 10 ranked features for each algorithm depicted in Figure 10, namely the ETC and the GBC.

Table 4 displays the three models with the best performance on the testing data, considering the standard hyperparameters of the tune_model function. By determining the ideal configuration of hyperparameter tuning, this function improves the performance of the classification model [²⁰,²¹]. With respect to the accuracy metric, in this paper were used the F1-Score like [⁵], because this represents the balance between precision and recall comparing the way each algorithm performed on the testing data.

The results presented in Table 4 illustrate the performance and behavior of the classification models when applied to the predict_model function without any alterations or parameter adjustments. This analysis allows us to gain valuable insights into how well the models generalize to unseen data and provides a comprehensive evaluation of their predictive capabilities. Table 5 displays the results of eleven papers that utilized the same dataset and applied various algorithms for data classification, following a similar approach to our study. Each paper employed different techniques and hyperparameter optimization methods to achieve their results.

Thumbnail

Table 5
Comparison of the accuracy between the proposed method at this work and previous works that apply classify to the same dataset.

Notably, only the papers authored by [³³] and [²²] adopted our proposal of utilizing SMOTE to balance the dataset class, searching enhancing the accuracy and effectiveness of the classification models used in them analyzes considering the same dataset.

The studies referenced in the analysis presented in Table 5 were meticulously curated to demonstrate a spectrum of methodologies applied to the common dataset over a significant timeline. This chronological analysis started with the work proposed by Gil and Johnson (2009) [³³] and advanced through to study of Raya (2023) [²²], encompassing the methodological progression within the domains of data processing and machine learning. Our selection criteria centered on methodological diversity, including a variety of feature selection techniques and classification strategies to provide a thorough overview of both advancements and established practices in the field. This carefully chosen collection of research not only traces the trajectory of accuracy improvements over time but also highlights the distinctive impact of varied techniques alongside the use of distinct methodologies. Including studies with different configurations for train-test splits and sample balancing methods, such as SMOTE, establishes a robust framework to evaluate the effectiveness of the feature selection methodology adopted in our research.

DISCUSSION

In this study, we introduce an objective and automated classification method utilizing a low-code machine learning library in Python. The analysis involves considering features extracted from speech data in an unbalanced dataset comprising both healthy individuals and patients with PD.

Initially, data preprocessing was performed using functions from the Pandas library. The CSV file was imported, and initial treatments included excluding the ‘name’ variable, which contained recording number for each patient [³³].

In the discussion of our findings, it is pertinent to revisit the statistical analyses presented in Figure 1. Initially, 'spread2' appeared to follow a normal distribution, a notion confirmed by the Shapiro-Wilk test, which yielded a p-value of 0.6545, affirming its normality. This aligns with expectations based on the feature's histogram and Q-Q plot. Conversely, ‘MDVP:RAP’ exhibited a significant deviation from normality, a finding supported by a Shapiro-Wilk test p-value of virtually zero (p = 6.9969e-20). Although reporting the test statistic is less common in literature, the extreme p-value provides unequivocal evidence against the hypothesis of normal distribution for 'MDVP:RAP'. Understanding these results is crucial for grasping the dataset's underlying structure observed in this study. This fact motivated the selection of the ‘normalize = True’ and ‘normalize_method = zscore’ parameters in the setup function, as detailed in Figure 5, ensuring the data is appropriately scaled and normalized for subsequent analytical procedures.

The study proposed by Little (2009) [²³] presented the probability distribution for normalized variables, serving as a reference point for our investigation. However, our research aimed to further elucidate the necessity of data normalization, showing the probability distribution of different variables, such as ‘MDVP:RAP’ and ‘spread2’, showing that some follow a Gaussian distribution, and some don’t, hence the need for normalization. Specifically, we conducted exploratory data analysis on variables such as ‘PPE x RPDE’, ‘PPE x spread1’, ‘MDVP:RAP’ x ‘MDVP:PPQ’ and ‘PPE x HNR’. These variables were chosen to complement the findings of the work proposed by [²³], which presents the analysis about the variables ‘MDVP:Jitter(Abs)’ x ‘MDVP:Jitter(%)’ and ‘PPE x DFA’ after the normalization process and provide additional insights into the impact of normalization on data distribution and analysis.

Observing the scatter plot analyses from Figure 2, we have distinct correlation strengths among the acoustic features of the dataset. Notably, 'PPE' and 'RPDE' exhibit a moderate correlation (r = 0.5459), suggesting a meaningful but less relationship. In contrast, a strong correlation is evident both between 'PPE' and 'spread1' (r = 0.9624), and 'MDVP:RAP' and 'MDVP:PPQ' (r = 0.9573), indicating that these pairs of features may share a common underlying factor. The negative correlation between 'PPE' and 'HNR' (r = -0.6929) underscores the inverse relationship in their behavior. These insights led to the decision to enable the ‘remove_multicollinearity = True’ parameter in the setup function, aiming to mitigate the impact of multicollinearity on the predictive models developed within this study.

After the EDA analysis and the preprocessing, data configuration was accomplished by assigning parameters using the setup function, as shown in Figure 5. The parameter ‘train_size = 0.70’ was set to compare the results of this study with those obtained by previous works using the same dataset [⁹,²²,³³,⁴¹]. Additionally, the parameter ‘session_id = 1245’ was set to ensure reproducibility of results, a good practice in research. The data was further normalized (zero mean, unit variance) [⁵] using the parameters ‘normalize = True’ and ‘normalize_method = zscore’.

During data preprocessing and EDA, it was observed that the dataset was unbalanced, with more records of PD patients than healthy patients. As shown in Figure 7, a KDE plot of the class distribution for the variable ‘spread1’ before and after applying the SMOTE parameter in the setup function was created to address this issue. The dataset's imbalance was addressed by applying filters to the "status" variable. The over sample technique SMOTE generates samples for the imbalanced class, resulting in a balanced distribution. The KDE plots in Figure 7 (a) and (b) provide valuable insights into the distribution of numerical variables for both classes, facilitating visual evaluation of each class before and after the SMOTE technique. The histogram presented in Figure 6 illustrates a balanced sampling for classes 0 (Healthy) and 1 (PD), ensuring fair and unbiased classification outcomes [²⁰].

One other crucial parameter considered in the initial data configuration using the setup function was remove_multicollinearity = 'True', with a multicollinearity threshold of 0.96. This parameter was set to address the expectation of many ML models to receive normalized data within the same scale. By setting remove_multicollinearity = 'True', the setup function does not present all variables to the models used in the paper, as it automatically removes highly correlated variables. The number of acoustic variables was reduced to 14, with the excluded variables being ‘MDVPJitter(%)’, ‘MDVPRAP’, ‘MDVPShimmer(dB)’, ‘MDVPAPQ’, ‘Jitter DDP’, ‘Shimmer APQ3’, ‘Shimmer APQ5’, ‘Shimmer DDA’. The application of CFS, graphically detailed in Figure 4, showcases the feature space reduction from 22 to 14 acoustic features, endorsing the effectiveness of our feature selection strategy. The initial analysis involved applying the PyCaret library models to classify the 14 acoustic features, resulting in a dataframe with values for the Accuracy, Recall, Precision, F1-Score, and MCC metrics which values are presented in Table 3.

It is noteworthy that the three machine learning models exhibiting the highest accuracy on the training data are the ETC with an accuracy of 91.87%, CATBOOST with 90.33%, and RF with 89.62%. However, not all of them necessarily translate to the best performance following the hyperparameter optimization process and their subsequent application to the test data.

After this step all 17 ML models had their hyperparameters optimized through the tune_model function of the PyCaret library considering the default parameters of this function. The models after an optimal combination of its hyperparameters were applied to the testing data by predict_model function and of which the three models KNN, ETC and GBC stood out with being those with the highest value for Accuracy and for the F1-Score metric.

The AUC results obtained in our experiments were comparable to those of other studies in the literature [²⁹,³⁰,³²]. After optimization, GBC, ETC, and KNN algorithms produced better overall AUC results. The ROC curves for KNN in Figure 9 demonstrate the significance of these results, with values in the interval I = [0,¹], and the upper limit of this interval considered optimal. The Figure 10 illustrates how each algorithm utilizes different features during the learning process. While some features were common in both ETC and GBC algorithms, such as ‘PPE’, ‘spread1’, ‘spread2’, ‘MDVPShimmer’, ‘MDVPFo(Hz)’, ‘MDVPFlo(Hz)’, ‘MDVPPPQ’ and ‘D2’ they were assigned different weights.

Applying the tune_model function to all models, we obtained the confusion matrix presented in Figure 8 for the best three models, achieving an accuracy of 93.22% for GBC, 94.92% for ETC, and 98.30% for KNN on the testing data. Other metrics, including Recall, Precision, F1-Score, and MCC, also displayed significant values, and the best three models exhibited improved performance compared to their training data results after the optimization process.

Chronological advancements in algorithm application to the dataset used in this study have been marked by a variety of methodological approaches. Commencing with Gil and Johnson (2009) [³³], the exploratory foundation was set with two algorithms MLP and SVM, securing accuracies between 92.31% and 93.33%. Das (2010) [⁴¹], on the other hand, introduced four algorithms, Neural Network, DMNeural, Regression and DT, with performance levels spanning between 84.30% and 92.90%. Progressing in complexity, Yang S, Zheng F, Luo X and coauthors (2014) [⁴⁰] witch accuracies between 79.00% and 91.80% and Al-Fatlawi and coauthors (2016) [³⁹] with the DBN get accuracy of 94.00%, demonstrating the critical impact of algorithmic choice on model performance with a range of accuracies.

Moving forward, Lahmiri and coauthors (2018) [³⁷] presented SVM, NB, RT, KNN and the LDA method, followed by Marar and coauthors (2018) [³⁸] who incorporated ANN RF, SVM, K(10)NN and Kernel SVM, both contributing to the diversification of computational strategies with accuracies between 69.00% and 94.87%. Kadam and Jadhav (2019) [³⁶] continued this trajectory by implementing DNN and FESA-DNN, reflecting ongoing innovation with an achieved accuracy of 92.19% and 93.48%.

The implementation of PCA by Anila M (2020) [³⁴] and Rasheed (2020) [³⁵] further honed feature reduction techniques, reaching up to 97.50% accuracy. Senturk (2020) [⁹] pursued this with the evaluation of three algorithms, CART, SVM and ANN, maintaining a high accuracy spectrum from 90.76% to 93.84%. Most recently, Raya (2023) [²²] utilized a suite of five algorithms, KNN, SVM, DTC, RFC and MLP, achieving an impressive range of 90.76% to 98.31% accuracy, marking the highest reported performance to date.

The performance of our models demonstrated a significant advancement over the previously reported methodologies. Our RFC achieved an accuracy of 93.22%, exceeding Marar and coauthors (2018) [³⁸] 87.17%, Anila M's (2020) [³⁴] 91.52%, and Raya's (2023) [²²] 92.00% for similar models. This not only reflects the robustness of our model but also the effectiveness of our feature engineering and optimization processes. The ETC obtained an accuracy of 94.92%, substantially outperforming Das's DT model at 84.30% and Raya's (2023) [²²] DT at 90.00%, showcasing the potency of our approach in extracting predictive insights from the dataset. Furthermore, our KNN model set a new benchmark with an accuracy of 98.30%, surpassing the KNN accuracies of Marar and coauthors (2018) [³⁸] at 84.61%, Lahmiri and coauthors (2018) [³⁷] at 69.00%, Anila M (2020) [³⁴] at 86.44%, and Raya (2023) [²²] at 90.76%. This evidences the superior data handling and classification capabilities inherent in our model. Conversely, MLP yielded an accuracy of 91.53%, which, while formidable, did not quite reach the heights of Gil and Johnson's (2009) [³³] 92.31% or Raya's (2023) [²²] 98.31%.

In our comprehensive review of 11 pertinent studies, we observed that 8 implemented some form of feature removal to enhance model performance, emphasizing the critical role of feature selection in predictive accuracy. For instance, Raya (2023) [²²] utilized 8 features, whereas our work incorporated 14 features, optimizing the balance between model complexity and performance. Senturk (2020) [⁹] demonstrated this balance by employing 7 features through Feature Importance and 13 via Recursive Feature Elimination, showcasing varied approaches within a single study. Similarly, Rasheed (2020) [³⁵] utilized 15 features, Al-Fatlawi and coauthors (2016) [³⁹] 16 features, and Anila M (2020) [³⁴] 17 features.

The results observed in your study offer a comprehensive analysis by evaluating several models, including post-application of feature selection techniques conducted on the initial set of 22 acoustic features, as demonstrated in Figure 3, to refine and optimize each model, alongside the SMOTE technique to minimize the imbalance between classes. This meticulous approach, encompassing adjustments such as removing multicollinearity and parameter optimization for each of the proposed models, culminated in the data presented in Table 5. By providing this extensive comparison, we aim to contribute scientifically to the body of literature that often relies on a limited number of machine learning models applied to similar datasets, such as [⁴,⁹,³³,³⁵,³⁷,³⁸,⁴¹] and validate the use of the proposed low-code approach.

It is noteworthy that despite the excellent results obtained in our study with the dataset proposed by [²³], compared to other works employing the same dataset as highlighted in Table 5, it is crucial to acknowledge that in a real clinical scenario, the detection of PD based solely on a single vowel, as observed in the dataset of this study, is unlikely. A recent study [⁴²] demonstrated that the predominant voice disorder is just one subtype of speech found in newly diagnosed PD cases, emphasizing the necessity for further analysis and validation of the results presented in our research.

In conclusion, our findings indicated that all models achieved excellent performance, with F1-Score values ranging from 93.48% to 98.85%, representing a balance between precision and recall. It is essential to consider that certain modeling choices, such as the size of training and testing data and the number of folds, shown in Figure 5, may impact the performance of some algorithms and lead to variations in the results presented in this paper. Nonetheless, the machine learning libraries used in this study demonstrate their potential as tools for assisting clinicians in the accurate diagnosis of Parkinson's disease. The framework for classification modeling with voice analysis using the PyCaret library in Python can be easily replicated in future investigations.

LIMITATIONS OF THE STUDY

Our investigation is subject to several limitations, with the primary concern being the restricted sample size, particularly evident in the comparatively imbalanced healthy control group compared to the PD group; to address this imbalance, we employed the SMOTE. However, while this technique was effective in balancing the dataset and improving model performance, it is important to acknowledge that the use of synthetic data can introduce biases, especially in clinical applications where the natural variability of real-world data is critical. The generation of synthetic samples may not fully capture the complexity of the original data, potentially affecting the models' ability to generalize when applied to new data. Additionally, the limited sample size, with 195 samples from 31 subjects, raises concerns regarding overfitting and the generalization of the models, especially considering that the same subjects are represented multiple times. This limitation is particularly critical, as it may unduly influence the models' performance and predictive capabilities. An external validation is imperative to assess the generalizability and performance of the proposed model outlined in our study, thus avoiding potential overfitting when introducing data from new speakers to the models [⁴,⁴³].

Although the models showcased in this article demonstrated relatively high classification accuracy, the analysis of feature importance should be approached with caution, as the ranking of importance may undergo variations when the study is replicated in a larger cohort. Another noteworthy limitation arises from the exclusive inclusion of subjects with PD and healthy controls in our dataset. It remains uncertain whether our model possesses the capability to discriminate between individuals with PD and those afflicted by other conditions that may impact speech [²⁸]. Speech, by itself, cannot be the sole determinant for an accurate diagnosis of Parkinson's disease; other symptoms should be collectively considered to enhance the diagnostic reliability. However, it could work as a supportive tool to diagnose it.

It has been demonstrated that there’s a correlation between voice patterns and the age of individuals with Parkinson's disease [⁴⁴]. However, our focus on acoustic features alone stems from the study's objective, which aimed to investigate the validity of an application of a low-code library for PD diagnosis. This approach facilitated a direct comparison with prior studies utilizing the same dataset.

Another point to be considered is the use of PyCaret's default settings for training and validating the machine learning models. We chose this approach to ensure the replicability of the process and ease of implementation, given that PyCaret is a low-code tool that is easy to use and accessible. However, we recognize that this choice may not fully exploit the maximum potential of each model. More specific libraries, which allow for finer and more customized hyperparameter tuning, could potentially improve the performance of certain models for this task. Future studies could explore this possibility to further optimize the results.

However, variables such as the Hoehn & Yahr stage and the gender of the subjects were not incorporated into the analysis, even [⁴⁵] showing that there is a correlation between speech and motor symptoms and relation to gender and PD by [¹⁴], was preserved the original dataset, given our study's specific focus and objectives.

CONCLUSION

Considering the initial aspects and results observed in this paper, it can be affirmed that Machine Learning is capable of contributing as a tool for the analysis and interpretation of data from patients with PD, to understand and classify their signs and symptoms, thus contributing to the advancement of Medicine. In this brief study on the use of ML for the analysis of the voice record of patients with PD, 17 classification models present in the PyCaret library were observed, among which the three best models for the classification of the status of patients, for the analyzed dataset, were: the model GBC, ETC, and KNN, with accuracy, precision, and F1-Score greater than 93%. It is important to give special emphasis to the GBC models that went from an accuracy of 88.13% in relation to the training data to a value of 93.22% on the testing data which is equivalent to a variation in performance of 5.78% and the KNN model, which presented an accuracy of 85.22% on the training data to a value of 98.30% on the testing data equivalent to a variation of 15.35%. At this point, it is important to emphasize that the entire process of comparison and optimization of the models is made considering standardized values of the PyCaret library to allow its replication to validate the results obtained in this paper. The use of a low-code tool in our study has produced remarkable and comparable outcomes to those referenced in Table 5. This approach underscores the effectiveness of leveraging advanced technology to address complex challenges efficiently and our results demonstrated the potential of low-code tools in delivering substantial value, achieving outcomes consistent with, and sometimes even exceeding, those of the referenced works. The findings observed in this paper are the first step to demonstrate the potential of using Machine Learning libraries such as PyCaret as a complementary tool to clinical practice, since the low-code application of the techniques are relatively simple, employing low code, and even so, achieved significant performance regarding the classification of PD patients. In this way, Machine Learning may, in the future, be clinically useful for decision-making about the clinical status of the patient, helping doctors and health professionals to make a more assertive decision about the real clinical status, thus providing a better quality of life to the patient. Furthermore, in future endeavors, this approach is expected to be enhanced by incorporating specially tailored applications within a web-based framework. This advancement will empower users worldwide to capture brief speech samples from individuals and analyze the gathered vocal data, thereby contributing to the initial identification of potential PD cases, as indicated by the studies [¹⁶,⁴⁶]. Additionally, an application will facilitate the recording of patients' voices and the extraction of voice features, which will then be transmitted to a web application capable of utilizing this data to feed machine learning models for the prediction of Parkinson's disease patients based on voice features. Moreover, in conjunction with voice data, other datasets collected through non-invasive procedures will be analyzed using computer vision techniques. This comprehensive approach will enhance the knowledge generated by this study and pave the way for the development of a user-friendly, non-invasive diagnostic web service for Parkinson's disease utilizing artificial intelligence.

Acknowledgments:

The authors of this work are grateful the National Council for Scientific and Technological Development (CNPq); the Coordination for the Improvement for Higher Level Personnel (CAPES); the Ministry of Science, Technology, Innovation, and Communications (MCTIC); the Research Foundation of the State of Minas Gerais (FAPEMIG); the Federal Institute of Education, Science, and Technology Goiano (IF Goiano) - Campus Rio Verde and Campus Cristalina and Centro de Excelência em Agricultura Exponencial (CEAGRE), for the financial and structural support to conduct this study; and the UCI Machine Learning Repository for making available the database used in this study. A. A. Pereira is a research productivity fellow at CNPq, Brazil (309525/2021-7).

Ethics approval and consent to participate.

The data was obtained from the “Oxford Parkinson’s Disease Detection Dataset with Acoustic Features”, which was donated to the UCI Machine Learning Repository not including individual level identifiers.

REFERENCES

¹ Avuçlu E, Elen A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Med Biol Eng Comput 2020;58:2775-88. https://doi.org/10.1007/s11517-020-02260-3
» https://doi.org/10.1007/s11517-020-02260-3
² Dickson JM, Grünewald RA. Somatic symptom progression in idiopathic Parkinson’s disease. Parkinsonism Relat Disord 2004;10:487-92. https://doi.org/10.1016/j.parkreldis.2004.05.005
» https://doi.org/10.1016/j.parkreldis.2004.05.005
³ Ray Dorsey E, Elbaz A, Nichols E, Abd-Allah F, Abdelalim A, Adsuar JC, et al. Global, regional, and national burden of Parkinson’s disease, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol 2018;17:939-53. https://doi.org/10.1016/S1474-4422(18)30295-3
» https://doi.org/10.1016/S1474-4422(18)30295-3
⁴ Karabayir I, Goldman SM, Pappu S, Akbilgic O. Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Med Inform Decis Mak 2020;20:228. https://doi.org/10.1186/s12911-020-01250-7
» https://doi.org/10.1186/s12911-020-01250-7
⁵ Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L. Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson’s Disease: A Comprehensive Machine Learning Approach. Sci Rep 2019;9:17269. https://doi.org/10.1038/s41598-019-53656-7
» https://doi.org/10.1038/s41598-019-53656-7
⁶ Jankovic J. Parkinson’s disease: Clinical features and diagnosis. J Neurol Neurosurg Psychiatry 2008;79:368-76. https://doi.org/10.1136/jnnp.2007.131045
» https://doi.org/10.1136/jnnp.2007.131045
⁷ Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: A review. Egypt. Inform. J. 2018;19:179-89. https://doi.org/10.1016/j.eij.2018.03.002
» https://doi.org/10.1016/j.eij.2018.03.002
⁸ Malvea A, Babaei F, Boulay C, Sachs A, Park J. Deep brain stimulation for Parkinson’s Disease: A Review and Future Outlook. Biomed Eng Lett 2022;12:303-16. https://doi.org/10.1007/s13534-022-00226-y
» https://doi.org/10.1007/s13534-022-00226-y
⁹ Karapinar Senturk Z. Early diagnosis of Parkinson’s disease using machine learning algorithms. Med Hypotheses 2020;138:109603. https://doi.org/10.1016/j.mehy.2020.109603
» https://doi.org/10.1016/j.mehy.2020.109603
¹⁰ Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, et al. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 2019;74. https://doi.org/10.1016/j.asoc.2018.10.022
» https://doi.org/10.1016/j.asoc.2018.10.022
¹¹ Jeancolas L, Mangone G, Petrovska-Delacrétaz D, Benali H, Benkelfat B-E, Arnulf I, et al. Voice characteristics from isolated rapid eye movement sleep behavior disorder to early Parkinson’s disease. Parkinsonism Relat Disord 2022;95:86-91. https://doi.org/10.1016/j.parkreldis.2022.01.003
» https://doi.org/10.1016/j.parkreldis.2022.01.003
¹² Naranjo L, Pérez CJ, Campos-Roca Y, Martín J. Addressing voice recording replications for Parkinson’s disease detection. Expert Syst Appl 2016;46:286-92. https://doi.org/10.1016/j.eswa.2015.10.034
» https://doi.org/10.1016/j.eswa.2015.10.034
¹³ Rusz J, Hlavnička J, Novotný M, Tykalová T, Pelletier A, Montplaisir J, et al. Speech Biomarkers in Rapid Eye Movement Sleep Behavior Disorder and Parkinson Disease. Ann Neurol 2021;90:62-75. https://doi.org/10.1002/ana.26085
» https://doi.org/10.1002/ana.26085
¹⁴ Rusz J, Tykalová T, Novotný M, Zogala D, Růžička E, Dušek P. Automated speech analysis in early untreated Parkinson’s disease: Relation to gender and dopaminergic transporter imaging. Eur J Neurol 2022;29:81-90. https://doi.org/10.1111/ene.15099
» https://doi.org/10.1111/ene.15099
¹⁵ Joshi D, Khajuria A, Joshi P. An automatic non-invasive method for Parkinson’s disease classification. Comput Methods Programs Biomed 2017;145:135-45. https://doi.org/10.1016/j.cmpb.2017.04.007
» https://doi.org/10.1016/j.cmpb.2017.04.007
¹⁶ Tougui I, Jilbab A, Mhamdi J El. Machine Learning Smart System for Parkinson Disease Classification Using the Voice as a Biomarker. Healthc Inform Res 2022;28:210-21. https://doi.org/10.4258/hir.2022.28.3.210
» https://doi.org/10.4258/hir.2022.28.3.210
¹⁷ Visibelli A, Roncaglia B, Spiga O, Santucci A. The Impact of Artificial Intelligence in the Odyssey of Rare Diseases. Biomedicines 2023;11:887. https://doi.org/10.3390/biomedicines11030887
» https://doi.org/10.3390/biomedicines11030887
¹⁸ Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. vol. I. 2nd Ed. Springer; 2009.
¹⁹ Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial Intelligence and Machine Learning in Pathology_ The Present Landscape of Supervised Methods _ Enhanced Reader. Acad Pathol 2019;6. https://doi.org/10.1177/2374289519873088
» https://doi.org/10.1177/2374289519873088
²⁰ Moez Ali. PyCaret: An open source, low-code machine learning library in Python 2020. https://www.pycaret.org (Accessed January 28, 2023).
» https://www.pycaret.org
²¹ Tolios G. Simplifying Machine Learning with PyCaret A Low-code Approach for Beginners and Experts! . vol. 1. 1st ed. Leanpub book; 2022.
²² Alshammri R, Alharbi G, Alharbi E, Almubark I. Machine learning approaches to identify Parkinson’s disease using voice signal features. Front Artif Intell 2023;6. https://doi.org/10.3389/frai.2023.1084001
» https://doi.org/10.3389/frai.2023.1084001
²³ Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans Biomed Eng 2009;56:1015-22. https://doi.org/10.1109/TBME.2008.2005954
» https://doi.org/10.1109/TBME.2008.2005954
²⁴ Little M. Parkinsons. UCI Machine Learning Repository, 2008. https://doi.org/https://doi.org/10.24432/C59C74
» https://doi.org/https://doi.org/10.24432/C59C74
²⁵ Qasim HM, Ata O, Ansari MA, Alomary MN, Alghamdi S, Almehmadi M. Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem. Medicina (B Aires) 2021;57:1217. https://doi.org/10.3390/medicina57111217
» https://doi.org/10.3390/medicina57111217
²⁶ Jatoth C, Neelima E, Mayuri AVR, Rao AS. Effective monitoring and prediction of Parkinson disease in Smart Cities using intelligent health care system. Microprocess Microsyst 2022;92:104547. https://doi.org/10.1016/j.micpro.2022.104547
» https://doi.org/10.1016/j.micpro.2022.104547
²⁷ Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One 2019;14:e0224365. https://doi.org/10.1371/journal.pone.0224365
» https://doi.org/10.1371/journal.pone.0224365
²⁸ Guimaraes MT, Medeiros AG, Almeida JS, Falcao y Martin M, Damasevicius R, Maskeliunas R, et al. An Optimized Approach to Huntington’s Disease Detecting via Audio Signals Processing with Dimensionality Reduction. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE; 2020, p. 1-8. https://doi.org/10.1109/IJCNN48605.2020.9206773
» https://doi.org/10.1109/IJCNN48605.2020.9206773
²⁹ Suppa A, Costantini G, Asci F, Di Leo P, Al-Wardat MS, Di Lazzaro G, et al. Voice in Parkinson’s Disease: A Machine Learning Study. Front Neurol 2022;13. https://doi.org/10.3389/fneur.2022.831428
» https://doi.org/10.3389/fneur.2022.831428
³⁰ Medeiros TA, Saraiva Junior RG, Cassia G de S e, Nascimento FA de O, Carvalho JLA de. Classification of 1p/19q Status in Low-Grade Gliomas: Experiments with Radiomic Features and Ensemble-Based Machine Learning Methods. Braz. Arch. Biol. Technol. 2023;66. https://doi.org/10.1590/1678-4324-2023230002
» https://doi.org/10.1590/1678-4324-2023230002
³¹ Prati RC, Batista GE de APA, Monard MC. Curvas ROC para avaliação de classificadores. IEEE Lat. Am. Trans, 2008;6(2):215-22. https://doi.org/10.1109/TLA.2008.4609920
» https://doi.org/10.1109/TLA.2008.4609920
³² Bao G, Lin M, Sang X, Hou Y, Liu Y, Wu Y. Classification of Dysphonic Voices in Parkinson’s Disease with Semi-Supervised Competitive Learning Algorithm. Biosensors (Basel) 2022;12:502. https://doi.org/10.3390/bios12070502
» https://doi.org/10.3390/bios12070502
³³ Gil D, Johnson M. Diagnosing Parkinson by using Artificial Neural Networks and Support Vector Machines. Glob. J. Comput. Sci. Technol. 2009;9:63-71.
³⁴ Anila M, Pradeepini G. Diagnosis of Parkinson’s Disease using Artificial Neural Network. Int. J. Emerg. Trends Eng. Res. 2020;8:3700-7. https://doi.org/10.30534/ijeter/2020/131872020
» https://doi.org/10.30534/ijeter/2020/131872020
³⁵ Rasheed J, Hameed AA, Ajlouni N, Jamil A, Ozyavas A, Orman Z. Application of Adaptive Back-Propagation Neural Networks for Parkinson’s Disease Prediction. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), IEEE; 2020, p. 1-5. https://doi.org/10.1109/ICDABI51230.2020.9325709
» https://doi.org/10.1109/ICDABI51230.2020.9325709
³⁶ Kadam VJ, Jadhav SM. Feature Ensemble Learning Based on Sparse Autoencoders for Diagnosis of Parkinson’s Disease, 2019, p. 567-81. https://doi.org/10.1007/978-981-13-1513-8_58
» https://doi.org/10.1007/978-981-13-1513-8_58
³⁷ Lahmiri S, Dawson DA, Shmuel A. Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed Eng Lett 2018;8:29-39. https://doi.org/10.1007/s13534-017-0051-2
» https://doi.org/10.1007/s13534-017-0051-2
³⁸ Marar S, Swain D, Hiwarkar V, Motwani N, Awari A. Predicting the occurrence of Parkinson’s Disease using various Classification Models. 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), IEEE; 2018, p. 1-5. https://doi.org/10.1109/ICACAT.2018.8933579
» https://doi.org/10.1109/ICACAT.2018.8933579
³⁹ Al-Fatlawi AH, Jabardi MH, Ling SH. Efficient diagnosis system for Parkinson’s disease using deep belief network. 2016 IEEE Congress on Evolutionary Computation (CEC), IEEE; 2016, p. 1324-30. https://doi.org/10.1109/CEC.2016.7743941
» https://doi.org/10.1109/CEC.2016.7743941
⁴⁰ Yang S, Zheng F, Luo X, Cai S, Wu Y, Liu K, et al. Effective Dysphonia Detection Using Feature Dimension Reduction and Kernel Density Estimation for Patients with Parkinson’s Disease. PLoS One 2014;9:e88825. https://doi.org/10.1371/journal.pone.0088825
» https://doi.org/10.1371/journal.pone.0088825
⁴¹ Das R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl 2010;37:1568-72. https://doi.org/10.1016/j.eswa.2009.06.040
» https://doi.org/10.1016/j.eswa.2009.06.040
⁴² Rusz J, Tykalova T, Novotny M, Zogala D, Sonka K, Ruzicka E, et al. Defining Speech Subtypes in De Novo Parkinson Disease. Neurology 2021;97:e2124-35. https://doi.org/10.1212/WNL.0000000000012878
» https://doi.org/10.1212/WNL.0000000000012878
⁴³ Rusz J, Švihlík J, Krýže P, Novotný M, Tykalová T. Reproducibility of Voice Analysis with Machine Learning. Mov. Disord. 2021;36:1282-3. https://doi.org/10.1002/mds.28604
» https://doi.org/10.1002/mds.28604
⁴⁴ Skodda S, Visser W, Schlegel U. Gender-Related Patterns of Dysprosody in Parkinson Disease and Correlation Between Speech Variables and Motor Symptoms. J. Voice 2011;25:76-82. https://doi.org/10.1016/j.jvoice.2009.07.005
» https://doi.org/10.1016/j.jvoice.2009.07.005
⁴⁵ Rusz J, Tykalová T, Novotný M, Růžička E, Dušek P. Distinct patterns of speech disorder in early-onset and late-onset de-novo Parkinson’s disease. NPJ Parkinsons Dis 2021;7:98. https://doi.org/10.1038/s41531-021-00243-1
» https://doi.org/10.1038/s41531-021-00243-1
⁴⁶ Rahman W, Lee S, Islam MS, Antony VN, Ratnu H, Ali MR, et al. Detecting Parkinson Disease Using a Web-Based Speech Task: Observational Study. J Med Internet Res 2021;23:e26305. https://doi.org/10.2196/26305
» https://doi.org/10.2196/26305

Editor-in-Chief: Alexandre Rasi Aoki

Associate Editor: Raja Soosaimarian Peter Raj

Publication Dates

Publication in this collection
24 Mar 2025
Date of issue
2025

History

Received
15 Aug 2023
Accepted
21 Nov 2024

This is an article published in open access under a Creative Commons license.

[1] ¹ Avuçlu E, Elen A. Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements. Med Biol Eng Comput 2020;58:2775-88. https://doi.org/10.1007/s11517-020-02260-3
» https://doi.org/10.1007/s11517-020-02260-3

[2] ² Dickson JM, Grünewald RA. Somatic symptom progression in idiopathic Parkinson’s disease. Parkinsonism Relat Disord 2004;10:487-92. https://doi.org/10.1016/j.parkreldis.2004.05.005
» https://doi.org/10.1016/j.parkreldis.2004.05.005

[3] ³ Ray Dorsey E, Elbaz A, Nichols E, Abd-Allah F, Abdelalim A, Adsuar JC, et al. Global, regional, and national burden of Parkinson’s disease, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol 2018;17:939-53. https://doi.org/10.1016/S1474-4422(18)30295-3
» https://doi.org/10.1016/S1474-4422(18)30295-3

[4] ⁴ Karabayir I, Goldman SM, Pappu S, Akbilgic O. Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Med Inform Decis Mak 2020;20:228. https://doi.org/10.1186/s12911-020-01250-7
» https://doi.org/10.1186/s12911-020-01250-7

[5] ⁵ Rehman RZU, Del Din S, Guan Y, Yarnall AJ, Shi JQ, Rochester L. Selecting Clinically Relevant Gait Characteristics for Classification of Early Parkinson’s Disease: A Comprehensive Machine Learning Approach. Sci Rep 2019;9:17269. https://doi.org/10.1038/s41598-019-53656-7
» https://doi.org/10.1038/s41598-019-53656-7

[6] ⁶ Jankovic J. Parkinson’s disease: Clinical features and diagnosis. J Neurol Neurosurg Psychiatry 2008;79:368-76. https://doi.org/10.1136/jnnp.2007.131045
» https://doi.org/10.1136/jnnp.2007.131045

[7] ⁷ Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: A review. Egypt. Inform. J. 2018;19:179-89. https://doi.org/10.1016/j.eij.2018.03.002
» https://doi.org/10.1016/j.eij.2018.03.002

[8] ⁸ Malvea A, Babaei F, Boulay C, Sachs A, Park J. Deep brain stimulation for Parkinson’s Disease: A Review and Future Outlook. Biomed Eng Lett 2022;12:303-16. https://doi.org/10.1007/s13534-022-00226-y
» https://doi.org/10.1007/s13534-022-00226-y

[9] ⁹ Karapinar Senturk Z. Early diagnosis of Parkinson’s disease using machine learning algorithms. Med Hypotheses 2020;138:109603. https://doi.org/10.1016/j.mehy.2020.109603
» https://doi.org/10.1016/j.mehy.2020.109603

[10] ¹⁰ Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, et al. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 2019;74. https://doi.org/10.1016/j.asoc.2018.10.022
» https://doi.org/10.1016/j.asoc.2018.10.022

[11] ¹¹ Jeancolas L, Mangone G, Petrovska-Delacrétaz D, Benali H, Benkelfat B-E, Arnulf I, et al. Voice characteristics from isolated rapid eye movement sleep behavior disorder to early Parkinson’s disease. Parkinsonism Relat Disord 2022;95:86-91. https://doi.org/10.1016/j.parkreldis.2022.01.003
» https://doi.org/10.1016/j.parkreldis.2022.01.003

[12] ¹² Naranjo L, Pérez CJ, Campos-Roca Y, Martín J. Addressing voice recording replications for Parkinson’s disease detection. Expert Syst Appl 2016;46:286-92. https://doi.org/10.1016/j.eswa.2015.10.034
» https://doi.org/10.1016/j.eswa.2015.10.034

[13] ¹³ Rusz J, Hlavnička J, Novotný M, Tykalová T, Pelletier A, Montplaisir J, et al. Speech Biomarkers in Rapid Eye Movement Sleep Behavior Disorder and Parkinson Disease. Ann Neurol 2021;90:62-75. https://doi.org/10.1002/ana.26085
» https://doi.org/10.1002/ana.26085

[14] ¹⁴ Rusz J, Tykalová T, Novotný M, Zogala D, Růžička E, Dušek P. Automated speech analysis in early untreated Parkinson’s disease: Relation to gender and dopaminergic transporter imaging. Eur J Neurol 2022;29:81-90. https://doi.org/10.1111/ene.15099
» https://doi.org/10.1111/ene.15099

[15] ¹⁵ Joshi D, Khajuria A, Joshi P. An automatic non-invasive method for Parkinson’s disease classification. Comput Methods Programs Biomed 2017;145:135-45. https://doi.org/10.1016/j.cmpb.2017.04.007
» https://doi.org/10.1016/j.cmpb.2017.04.007

[16] ¹⁶ Tougui I, Jilbab A, Mhamdi J El. Machine Learning Smart System for Parkinson Disease Classification Using the Voice as a Biomarker. Healthc Inform Res 2022;28:210-21. https://doi.org/10.4258/hir.2022.28.3.210
» https://doi.org/10.4258/hir.2022.28.3.210

[17] ¹⁷ Visibelli A, Roncaglia B, Spiga O, Santucci A. The Impact of Artificial Intelligence in the Odyssey of Rare Diseases. Biomedicines 2023;11:887. https://doi.org/10.3390/biomedicines11030887
» https://doi.org/10.3390/biomedicines11030887

[18] ¹⁸ Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. vol. I. 2nd Ed. Springer; 2009.

[19] ¹⁹ Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial Intelligence and Machine Learning in Pathology_ The Present Landscape of Supervised Methods _ Enhanced Reader. Acad Pathol 2019;6. https://doi.org/10.1177/2374289519873088
» https://doi.org/10.1177/2374289519873088

[20] ²⁰ Moez Ali. PyCaret: An open source, low-code machine learning library in Python 2020. https://www.pycaret.org (Accessed January 28, 2023).
» https://www.pycaret.org

[21] ²¹ Tolios G. Simplifying Machine Learning with PyCaret A Low-code Approach for Beginners and Experts! . vol. 1. 1st ed. Leanpub book; 2022.

[22] ²² Alshammri R, Alharbi G, Alharbi E, Almubark I. Machine learning approaches to identify Parkinson’s disease using voice signal features. Front Artif Intell 2023;6. https://doi.org/10.3389/frai.2023.1084001
» https://doi.org/10.3389/frai.2023.1084001

[23] ²³ Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans Biomed Eng 2009;56:1015-22. https://doi.org/10.1109/TBME.2008.2005954
» https://doi.org/10.1109/TBME.2008.2005954

[24] ²⁴ Little M. Parkinsons. UCI Machine Learning Repository, 2008. https://doi.org/https://doi.org/10.24432/C59C74
» https://doi.org/https://doi.org/10.24432/C59C74

[25] ²⁵ Qasim HM, Ata O, Ansari MA, Alomary MN, Alghamdi S, Almehmadi M. Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem. Medicina (B Aires) 2021;57:1217. https://doi.org/10.3390/medicina57111217
» https://doi.org/10.3390/medicina57111217

[26] ²⁶ Jatoth C, Neelima E, Mayuri AVR, Rao AS. Effective monitoring and prediction of Parkinson disease in Smart Cities using intelligent health care system. Microprocess Microsyst 2022;92:104547. https://doi.org/10.1016/j.micpro.2022.104547
» https://doi.org/10.1016/j.micpro.2022.104547

[27] ²⁷ Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One 2019;14:e0224365. https://doi.org/10.1371/journal.pone.0224365
» https://doi.org/10.1371/journal.pone.0224365

[28] ²⁸ Guimaraes MT, Medeiros AG, Almeida JS, Falcao y Martin M, Damasevicius R, Maskeliunas R, et al. An Optimized Approach to Huntington’s Disease Detecting via Audio Signals Processing with Dimensionality Reduction. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE; 2020, p. 1-8. https://doi.org/10.1109/IJCNN48605.2020.9206773
» https://doi.org/10.1109/IJCNN48605.2020.9206773

[29] ²⁹ Suppa A, Costantini G, Asci F, Di Leo P, Al-Wardat MS, Di Lazzaro G, et al. Voice in Parkinson’s Disease: A Machine Learning Study. Front Neurol 2022;13. https://doi.org/10.3389/fneur.2022.831428
» https://doi.org/10.3389/fneur.2022.831428

[30] ³⁰ Medeiros TA, Saraiva Junior RG, Cassia G de S e, Nascimento FA de O, Carvalho JLA de. Classification of 1p/19q Status in Low-Grade Gliomas: Experiments with Radiomic Features and Ensemble-Based Machine Learning Methods. Braz. Arch. Biol. Technol. 2023;66. https://doi.org/10.1590/1678-4324-2023230002
» https://doi.org/10.1590/1678-4324-2023230002

[31] ³¹ Prati RC, Batista GE de APA, Monard MC. Curvas ROC para avaliação de classificadores. IEEE Lat. Am. Trans, 2008;6(2):215-22. https://doi.org/10.1109/TLA.2008.4609920
» https://doi.org/10.1109/TLA.2008.4609920

[32] ³² Bao G, Lin M, Sang X, Hou Y, Liu Y, Wu Y. Classification of Dysphonic Voices in Parkinson’s Disease with Semi-Supervised Competitive Learning Algorithm. Biosensors (Basel) 2022;12:502. https://doi.org/10.3390/bios12070502
» https://doi.org/10.3390/bios12070502

[33] ³³ Gil D, Johnson M. Diagnosing Parkinson by using Artificial Neural Networks and Support Vector Machines. Glob. J. Comput. Sci. Technol. 2009;9:63-71.

[34] ³⁴ Anila M, Pradeepini G. Diagnosis of Parkinson’s Disease using Artificial Neural Network. Int. J. Emerg. Trends Eng. Res. 2020;8:3700-7. https://doi.org/10.30534/ijeter/2020/131872020
» https://doi.org/10.30534/ijeter/2020/131872020

[35] ³⁵ Rasheed J, Hameed AA, Ajlouni N, Jamil A, Ozyavas A, Orman Z. Application of Adaptive Back-Propagation Neural Networks for Parkinson’s Disease Prediction. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), IEEE; 2020, p. 1-5. https://doi.org/10.1109/ICDABI51230.2020.9325709
» https://doi.org/10.1109/ICDABI51230.2020.9325709

[36] ³⁶ Kadam VJ, Jadhav SM. Feature Ensemble Learning Based on Sparse Autoencoders for Diagnosis of Parkinson’s Disease, 2019, p. 567-81. https://doi.org/10.1007/978-981-13-1513-8_58
» https://doi.org/10.1007/978-981-13-1513-8_58

[37] ³⁷ Lahmiri S, Dawson DA, Shmuel A. Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed Eng Lett 2018;8:29-39. https://doi.org/10.1007/s13534-017-0051-2
» https://doi.org/10.1007/s13534-017-0051-2

[38] ³⁸ Marar S, Swain D, Hiwarkar V, Motwani N, Awari A. Predicting the occurrence of Parkinson’s Disease using various Classification Models. 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), IEEE; 2018, p. 1-5. https://doi.org/10.1109/ICACAT.2018.8933579
» https://doi.org/10.1109/ICACAT.2018.8933579

[39] ³⁹ Al-Fatlawi AH, Jabardi MH, Ling SH. Efficient diagnosis system for Parkinson’s disease using deep belief network. 2016 IEEE Congress on Evolutionary Computation (CEC), IEEE; 2016, p. 1324-30. https://doi.org/10.1109/CEC.2016.7743941
» https://doi.org/10.1109/CEC.2016.7743941

[40] ⁴⁰ Yang S, Zheng F, Luo X, Cai S, Wu Y, Liu K, et al. Effective Dysphonia Detection Using Feature Dimension Reduction and Kernel Density Estimation for Patients with Parkinson’s Disease. PLoS One 2014;9:e88825. https://doi.org/10.1371/journal.pone.0088825
» https://doi.org/10.1371/journal.pone.0088825

[41] ⁴¹ Das R. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Syst Appl 2010;37:1568-72. https://doi.org/10.1016/j.eswa.2009.06.040
» https://doi.org/10.1016/j.eswa.2009.06.040

[42] ⁴² Rusz J, Tykalova T, Novotny M, Zogala D, Sonka K, Ruzicka E, et al. Defining Speech Subtypes in De Novo Parkinson Disease. Neurology 2021;97:e2124-35. https://doi.org/10.1212/WNL.0000000000012878
» https://doi.org/10.1212/WNL.0000000000012878

[43] ⁴³ Rusz J, Švihlík J, Krýže P, Novotný M, Tykalová T. Reproducibility of Voice Analysis with Machine Learning. Mov. Disord. 2021;36:1282-3. https://doi.org/10.1002/mds.28604
» https://doi.org/10.1002/mds.28604

[44] ⁴⁴ Skodda S, Visser W, Schlegel U. Gender-Related Patterns of Dysprosody in Parkinson Disease and Correlation Between Speech Variables and Motor Symptoms. J. Voice 2011;25:76-82. https://doi.org/10.1016/j.jvoice.2009.07.005
» https://doi.org/10.1016/j.jvoice.2009.07.005

[45] ⁴⁵ Rusz J, Tykalová T, Novotný M, Růžička E, Dušek P. Distinct patterns of speech disorder in early-onset and late-onset de-novo Parkinson’s disease. NPJ Parkinsons Dis 2021;7:98. https://doi.org/10.1038/s41531-021-00243-1
» https://doi.org/10.1038/s41531-021-00243-1

[46] ⁴⁶ Rahman W, Lee S, Islam MS, Antony VN, Ratnu H, Ali MR, et al. Detecting Parkinson Disease Using a Web-Based Speech Task: Observational Study. J Med Internet Res 2021;23:e26305. https://doi.org/10.2196/26305
» https://doi.org/10.2196/26305

	Healthy Controls	PD	Total
Recording number			195
Participants	8	23	31
Sex Male Female	3 5	16 7	19 12

Parameter Family	Abbreviation	Description
MDVP	Fo (Hz)	Average vocal fundamental frequency
	Fhi (Hz)	Maximum vocal fundamental frequency
	Flo (Hz)	Minimum vocal fundamental frequency
	Jitter (%)	Jitter as a percentage
	Jitter (Abs)	Absolute jitter in microseconds
	RAP	Relative amplitude perturbation
	PPQ	Five-point período perturbation quotient
	Shimmer	Local shimmer
	Shimmer (dB)	Local shimmer in decibels
	APQ	eleven-point amplitude perturbation quotient
Jitter	DDP	Average absolute difference of differences between cycles, divided by the average period
Shimmer	APQ3	Three-point amplitude perturbation quotient
	APQ5	Five-point amplitude perturbation quotient
	DDA	Average absolute difference between consecutive differences between the amplitudes of consecutive periods
Others	NHR	Noise to harmonics ratio
	HNR	Harmonics to noise ratio
	RPDE	Recurrence period density entropy
	D2	Correlation dimension
	DFA	Detrended fluctuation analysis
	Spread-1, Spread-2	Non-linear measures of fundamental frequency variation
	PPE	Pitch period entropy
Target	Status	0 - Healthy, 1 - PD

Model	Training data Mean of Accuracy, Recall, Precision, F1-Score (%), AUC ([0,¹]) and Matthews ([-¹,¹])
Model	Accuracy±std	Recall	Precision	F1-Score	AUC	MCC
ETC	91.87 ± 9.10	93.89	95.53	94.34	0.9772	0.8074
CATBOOST	90.33 ± 10.20	93.06	94.55	93.39	0.9459	0.7084
RF	89.62 ± 8.10	92.06	94.79	92.90	0.9618	0.6951
GBC	88.13 ± 6.60	92.35	93.02	92.11	0.9573	0.6642
DT	87.64 ± 9.20	92.16	91.29	91.45	0.8450	0.7050
LIGHTGBM	87.36 ± 8.64	92.17	91.73	91.40	0.9329	0.6359
ADA	86.10 ± 7.50	88.17	93.51	89.87	0.9342	0.6952
XGBOOST	85.27 ± 8.70	89.26	91.31	89.75	0.9303	0.6383
KNN	85.22 ± 8.60	82.24	97.75	88.90	0.9549	0.6828
RBFSVM	84.56 ± 9.07	85.98	93.62	88.86	0.9362	0.6110
QDA	83.41 ± 12.70	91.98	86.95	89.05	0.8403	0.5602
RIDGE	79.51 ± 8.57	80.96	91.33	85.03	0.0000	0.5071
LDA	78.85 ± 9.30	81.24	90.88	84.85	0.8618	0.4665
LR	78.79 ± 8.80	81.24	91.06	84.68	0.9287	0.4721
NB	78.63 ± 10.18	75.29	95.42	83.45	0.8809	0.5701
SVM	77.31 ± 9.28	81.35	88.08	83.92	0.0000	0.4261
DUMMY	24.07 ± 7.54	0.00	0.00	0.00	0.5000	0.0000

Model	Training data before tuning	Testing data after tuning
ETC	Accuracy = 91.87% AUC = 0.9772	Accuracy = 94.92% AUC = 0.9689
	Recall = 93.89%	Recall = 95.45%
	Precision = 95.53%	Precision = 97.67%
	F1-Score = 94.34%	F1-Score = 96.55%
	MCC = 0.8074	MCC = 0.8696
GBC	Accuracy = 88.13% AUC = 0.9573	Accuracy = 93.22% AUC = 0.9303
	Recall = 92.35%	Recall = 97.73%
	Precision = 93.02%	Precision = 93.48%
	F1-Score = 92.11%	F1-Score = 95.56%
	MCC = 0.6642	MCC = 0.8166
KNN	Accuracy = 85.22% AUC = 0.9549	Accuracy = 98.30% AUC = 0.9878
	Recall = 82.24%	Recall = 97.73%
	Precision = 97.75%	Precision = 100%
	F1-Score = 88.90%	F1-Score = 98.85%
	MCC = 0.6828	MCC = 0.9572

References	Technique of Feature Selection	Ratio of Train/Test Split	Over Sample Technique	Classify Method (Top 5)	Accuracy (%)
Raya (2023) [²²]	SelecKBest	70 / 30	SMOTE	KNN	90.76
				SVM	95.00
				DTC	90.00
				RFC	92.00
				MLP	98.31
Anila M (2020) [³⁴]	Principle Component Analysis	-	-	ANN	97.06
				RF	91.52
				SVM	93.22
				XGBOOST	88.13
				KNN	86.44
Rasheed (2020) [³⁵]	Principle Component Analysis	-	-	BPVAM, BPVAM-PCA	97.50
Senturk (2020) [⁹]	Feature Importance and Recursive Feature Elimination	-	-	CART	90.76
				SVM	93.84
				ANN	91.54
Kadam and Jadhav (2019) [³⁶]	-	50 / 50	-	DNN	92.19
Kadam and Jadhav (2019) [³⁶]	-	50 / 50	-	FESA-DNN	93.48
Lahmiri, S et al. (2018) [³⁷]	-	65 / 35	-	SVM	92.00
				NB	75.00
				LDA	70.00
				RT	70.00
				KNN	69.00
Marar S et al. (2018) [³⁸]	-	80 / 20	-	ANN	94.87
				RF	87.17
				SVM	84.61
				Kernel SVM	84.61
				K(10)NN	84.61
Al-Fatlawi et al. (2016) [³⁹]	Feature Selection	74 / 26	-	DBN	94.00
Yang S, Zheng F, Luo X et al. (2014) [⁴⁰]	Sequential Feature Selection	75 / 25	-	MAP	91.80
				FLDA	79.00
				SVMRBF	91.40
Das (2010) [⁴¹]	Variable Selection Component	65 / 35	-	Neural Network	92.90
				DMNeural	84.30
				Regression	88.60
				DT	84.30
Gil and Johnson (2009) [³³]	-	-	SMOTE	MLP	92.31
Gil and Johnson (2009) [³³]	-	-	SMOTE	SVM	93.33
Our proposed work	CFS	70 / 30	SMOTE	ETC	94.92
				GBC	93.22
				KNN	98.30
				MLP	91.53
				RFC	93.22