KSG estimation of reconstruction delay to detect vocal disorders in nonlinear dynamical analysis

Introduction: This research investigates the applicability of a relatively new estimator of mutual information, KSG estimator, to find the reconstruction delay of phase space in dynamical systems. There are evidences that the KSG estimator is more accurate than the naive method commonly used. Methods: In this paper we estimated mutual information between the voice signals and their delayed versions, with KSG method. The voice signals were obtained from a disordered voice database. Then, we found the reconstruction delay where mutual information reached its first minimum. We applied the encountered value of reconstruction delay in linear discriminant analysis, in order to discriminate between healthy and pathological voices or to discriminate between pathologies. Discrimination between voice pathologies using nonlinear measurements is still not much explored. Moreover, in this paper we used a single nonlinear measurement: reconstruction delay. Results: The results show that the reconstruction delay obtained with KSG method has increased classification rates in most cases, in terms of accuracy, sensitivity and specificity, when compared to the naive estimator usually adopted. Conclusion: The KSG estimator is a promising technique to improve the diagnosis of voice related pathologies.


Introduction
The mechanisms of voice production are of great complexity (Pontes et al., 2005). There are many laryngeal diseases that cause changes in the voice. These pathologies may be of organic origin, such as nodules, cysts or edemas, or of neurological origin, such as paralysis in the vocal folds (Davis, 1979;Quek et al., 2002). The laryngeal pathologies nodule, Reinke's edema and paralysis in the vocal folds are widely used in studies involving the classification of laryngeal pathologies in adults, both male and female (Barbosa-Branco and Romariz, 2006;Costa et al., 2013;Cummings, 2008;Pinho et al., 2016).
The methods of acoustic evaluation have raised interest in research for the development of tools to support the diagnosis of laryngeal pathologies, because it is a less traumatic method, not causing discomfort to the patient, when compared to traditional examinations for the detection of laryngeal pathologies. The methods of acoustic evaluation can be used both to evaluate the quality of voice, to perform pre-diagnosis of laryngeal pathologies, as well as the evolution of a medical or post-surgical treatment (Rabiner and Schafer, 1978).
There are several nonlinearities involved in vocal fold vibration and glottal wave generation. Due to such factors, classical methods of data analysis based on a linear model have been enriched with methods that are derived from the theory of nonlinear dynamical systems (Jiang et al., 2006). Over the last two decades, researches considering the techniques of nonlinear dynamical systems and chaos theory include: phoneme classification (Kokkinos and Maragos, 2005), automatic speaker recognition (Reynolds and Heck, 2000), discrimination between healthy and pathological voices (Henriquez et al., 2009), diagnosis of laryngeal pathologies and effects of clinical treatments (Awan et al., 2010;Chai et al., 2011;Vaziri et al., 2010). This paper has as main objective to increase the classification rates between healthy or pathological voices, as well as to discriminate between pathologies (discrimination among pathologies is still little explored in the literature) by using a more efficient method to estimate one parameter used in nonlinear dynamical analysis (NDA). This parameter is the reconstruction delay. Fraser and Swinney (1986) introduced a method for estimating the reconstruction delay as the first minimum of the mutual information in place of the interval in which autocorrelation first crosses zero. Mutual information is an established concept from information theory, which measures dependency between random variables and was firstly introduced to measure channel capacity (Cover and Thomas, 2006;Shannon and Weaver, 1949). By estimating mutual information with a more efficient mutual information estimator, which was proposed by Kraskov et al. (2004) and is called here KSG estimator, we could observe improved classification rates. These results were obtained in contrast with the naive estimator commonly used (which is based on adaptive partitioning).

Methods
This section presents the methods and data used in this paper in order to compare and to improve the classification rates in NDA.

Naive estimation of reconstruction delay
As mentioned above, to find the reconstruction delay it is often used the average mutual information method (Fraser and Swinney, 1986). According to this method, one can ensure reconstruction vectors with the lowest level of redundant information, yet still correlated. Information theory aims to identify how much information a measurement made at a given time t has when observing another measurement, of the same signal, at a later time t + τ (Kantz and Schreiber, 2004).
The average mutual information between ( ) x t and its outdated version ( ) x t + τ with the naive estimator is obtained from a histogram of b bins, created to estimate the probability distribution of the data signal ( ) x t (Costa et al., 2012): ( ) x t + τ in j-th interval (Kantz and Schreiber, 2004). The reconstruction delay, then, is the value of τ for which the average mutual information function reaches its first local minimum (Fraser and Swinney, 1986). However, there is evidence that the naive estimator for mutual information, obtained as above from histograms, is severely biased (Assis et al., 2016;Darbellay and Vajda, 1999). That is, the estimated mutual information value may not depict a true mutual information value nor a value close to the true mutual information value. This happens because this estimate strongly depends on the number b of bins (segments) used. Kraskov et al. (2004) developed a mutual information estimator (KSG estimator). KSG estimator is based on the work of Kozachenko and Leonenko (1987), which estimates entropy based on the k-th nearest neighbour distances. Mutual information can be written as (Equation 2):

KSG estimation of reconstruction delay
The basic idea of KSG estimator is to use different neighbours to estimate the marginal entropies ( ) H X and ( ) H Y and to estimate the joint entropy ( ) , H X Y in order to cancel estimation bias. There are two slightly different estimators made by Kraskov et al. (2004) with similar performance, one of these is adopted here. This estimator considers the distance of each point to its k-th nearest neighbour, projects the distance in relation to X and Y and considers the wider spacing of the two: . With these distances it is possible to count the number of points x n and y n in relation to X and Y which are at a distance strictly less than the spacing 2 ∈ , as illustrated in Figure 1.
The estimate is performed as: (3) Figure 1. Determination of n ∈ , x n and y n for the KSG algorithm, for 2 k = and a data sample realization (indicated by n). In this example, 3 x n = , 5, y n = and 13 N = .
where < ⬚ > denote the arithmetic mean, k is the number of neighbours considered, N is the sample size and ψ is the Digamma function. In this application of estimating reconstruction delay, just like with the naive method, each speech signal was delayed by τ and then mutual information was estimated between the original signal samples and the samples of the delayed signal. The number of samples of the speech signal was the number of samples to estimate mutual information.
If the distributions are very skewed and/or uneven, the authors of the method suggest to transform them so as to become more uniform (or at least singlehumped and more or less symmetric). In this case the KSG estimator gave excellent results after transforming the variables to Figure 2 illustrates reconstructed phase space examples with the optimum KSG estimated reconstruction delay τ to all speech classes considered in this paper.

Experimental data
The Disordered Voice Database (Model 4337, of the Kay Elemetrics, recorded by Massachusetts Eye and Ear Infirmary (MEEI) Voice and Speech Lab (Kay Elemetrics Corp., 1994) was used in the experiments. There were 53 talkers voice signals with healthy larynges and 118 talkers voice signals affected by laryngeal pathologies (55 voice signals of larynges affected by paralysis in the vocal folds, 45 voice signals of larynges affected by Reinke's edema, and 18 voice signals of larynges affected by vocal nodules). The voice signals are from sustained vowel /a/. The voice signals of healthy larynges, originally sampled at 50000 samples/s, were sub-sampled to 25000 samples/s to match the sampling rate of the voice signals of pathological larynges.
Five classes of signals were considered in this study: healthy voice (SDL), voice signal with paralysis on vocal folds (PRL), voice signal with Reinke's edema (EDM), voice signal with nodules on vocal folds (NDL) and all pathologies grouped (PTL). The linear discriminant analysis was used to investigate seven cases of discrimination: SDL vs. PTL, SDL vs. PRL, SDL vs. EDM, SDL vs. NDL, PRL vs. EDM, PRL vs. NDL, and EDM vs. NDL.

Classification
The selected voice signals database are analyzed by estimating reconstruction delay. Then, linear discriminant analysis and the statistical model using Cross-Validation k-fold stratified were performed to detect the presence of voice disorders caused by Reinke's edema, paralysis on vocal folds and nodules on vocal folds pathologies and to compare with the results obtained using the naive estimator. In this work, the value of k is equal to 10 in the cross-validation process. Initially, the classification performance is analyzed considering only two groups signals: one with all grouped pathologies and other with healthy voices (SDL vs. PTL). Posteriorly, the classification performance is analyzed among the other six classification cases: SDL vs. PRL, SDL vs. EDM, SDL vs. NDL, PRL vs. EDM, PRL vs. NDL, and EDM vs. NDL.

Evaluation and interpretation
In order to measure the accuracy of classifiers in each study case, three measures are commonly used: accuracy, sensitivity and specificity. These measures are related to the ability of a classifier in diagnosing a disease in a sick patient (True Positive -TP) or healthy patient (False Positive -FP), or, still, diagnosing a healthy state in a healthy patient (True Negative -TN) or sick patient (False Negative -FN) (Costa et al., 2012).
The accuracy of classification measures the global correct classification rate, reflecting the ability of the classifier to identify correctly when there is a disorder. The accuracy is defined as the ratio between the number of correctly classified cases and all presented cases to classifier (Costa et al., 2012):

TP TN Accuracy TP TN FP FN
The sensitivity of classification measures the classifier ability to identify the disorder when it actually exists. Sensitivity is defined as the ratio between the number of correctly classified cases with the disorder and the total number of cases of the disorder: The specificity of classification measures the ability of the classifier to identify the absence of the disorder when it actually does not exist. Specificity is defined as the ratio between the number of correctly classified healthy cases and the total number of healthy cases: The classifier presents high performance if it is able to obtain high values for accuracy, sensitivity and specificity. The representation of sensitivity and specificity is clearer when it comes to discrimination between healthy and pathological classes. When there is discrimination between pathologies, it must be defined in the classifier which group will have its correct classification measured by sensitivity and which group will have its correct classification measured by specificity.

Results
Before presenting the classification results, in this section we evaluate both naive and KSG estimators using synthetic signals. In the simulations, we generated bivariate Gaussian datasets, with 0 mean, unit variance and specific values for correlation coefficients: For these cases, there is an analytical value for mutual information, which can be used to compare the estimates: Figure 4 illustrates the estimation with both methods. It is visible from the simulations that KSG estimator is minimally biased, that is, the mean of the estimator is close to the analytical mutual information value, for all five values of tested k. However, there is a huge variation in the naive estimates according to the number of bins used. The mean value of the naive estimates does not generally fit the analytical mutual information value, especially when using more bins. Now we present the results obtained from the classification process of healthy and pathological voices in order to investigate the discriminatory potential of a more precisely estimated reconstruction delay. The objective is to compare the detection and the discrimination of voice disorders with the results obtained with the naive estimator. The voice disorders analyzed here are Reinke's edema, paralysis on the vocal folds and nodules on the vocal folds.

Reconstruction delay estimation
Initially we estimate the value of reconstruction delay using the naive estimator and the KSG estimator, in order to compare which method was able to estimate lower values. This was an interesting investigatory step, since it is desirable to estimate the lowest value of τ that reduces redundant information among vectors. Figure 5 illustrates the distribution of average values of τ for signals of healthy voices (SDL) and signals of pathologies with Reinke's edema (EDM), nodules (NDL) and paralysis (PRL). We used both naive and KSG estimators. When using KSG estimator, we used the parameter 3 k = , as recommended in literature (Kraskov et al., 2004).
The results presented in Figure 5 revealed some differences in values of reconstruction delay between naive estimator and KSG estimator for all classes considered. In some cases the values of reconstruction delay fall more than a half. For example, for edema class, the naive method estimates a value of reconstruction delay 16 τ = while KSG estimator estimates a value 8 τ = to the same speech signal. For nodule class, the naive method estimates a value of reconstruction delay 15 τ = while KSG estimator estimates a value 7 τ = to the same speech signal. For paralysis class, the classical method estimates a value of reconstruction delay 20 τ = while KSG estimator estimates a value 9 τ = to the same speech signal. We performed statistical tests to confirm the hypothesis that the distribution for each class and method were significantly different. We chose non-parametric statistical tests because they are more broadly acceptable. To evaluate differences in the medians of the reconstructed τ's with both methods within a single class, we used Wilcoxon signed-rank test. On the other hand, to evaluate differences between classes using KSG method, we used Wilcoxon rank sum test. The obtained results were: SDL(Naive) x SDL(KSG) -p= 0.0042 and h=1, EDM(Naive) x EDM(KSG) -p= 0,0001and h=1, NDL(Naive) x NDL(KSG) -p= 0.0005 and h= 1, PRL(Naive) x PRL(KSG) -p= Thus, there are significant differences between KSG estimated τ and naive estimated τ for all considered classes. When using KSG method to estimate τ, we observe that the comparison of SDL x EDM, SDL x NDL, SDL x PRL, SDL x PTL and PRL x NDL presented significant statistical differences among the estimated τ. We observe that the only cases where there was not a significant difference between KSG estimated τ were the comparisons EDM x NDL and PRL x EDM.

Classification between healthy voices versus affected voices by pathologies
This section presents the results obtained between healthy voice signals (SDL) and voice signals affected by diseases, Reiken's Edema (EDM), Paralysis (PRL) and Nodule (NDL), individually: SDL x EDM, SDL x PRL and SDL x NDL, and grouping all the pathologies in a single class: SDL x PTL. Table 1 presents the obtained values of accuracy, sensitivity and specificity in signal healthy voices (SDL) and pathological (PTL) using KSG and naive estimators. The pathological voice signals comprise, in this case, the signals of all the pathologies (Reinke's edema, paralysis and nodule) grouped in the same class. Table 2 presents the obtained values of accuracy, sensitivity and specificity in healthy voice signals (SDL) and voices signals affected by (EDM), healthy voice signals (SDL) and voices signals affected by (PRL), and in healthy voice signals (SDL) and voices signals affected by (NDL) using KSG estimator.
As seen from the previous tables, results obtained using KSG estimation of the reconstruction delay, in all classification cases, revealed significant differences from the results obtained using the naive estimator. In the classification between healthy voices and pathological voices, the accuracy obtained value was 68,86%, from 60,66% the naive estimator. In classification Healthy x Reinke's Edema, the accuracy obtained value was 69,22%, from 63,78% of the naive estimator. In the classification Healthy x Paralysis on vocal folds, the accuracy obtained value was 70,18%, from 68,64% of the naive estimator and classification Healthy x Nodule on vocal folds, the accuracy obtained value was 80,00%, from 73,32% of the naive estimator.

Classification of pathologically affected voices
This section presents the results obtained in the classification of voice signals affected by Reiken's edema (EDM), Nodules on the vocal folds (NDL) and Paralysis on the vocal folds (PRL) pathologies: EDM x NDL, PRL x NDL and PRL x EDM. Table 3 presents the values obtained of accuracy, sensitivity and specificity in voice signals affected by EDM and voice signals affected by NDL. There are also the results obtained in the classification of voice signals affected by PRL and voice signals affected by NDL, and voice signals affected by PRL and voice signals affected by EDM.  As it can be seen, the results obtained using the reconstruction delay estimated with the KSG estimator, for classification case EDM x NDL, presented higher average accuracy of the classifier when compared with results from naive estimator. Moreover, the obtained results using the reconstruction delay estimated with the KSG estimator also offered greater mean sensitivity.
For the classification case PRL x NDL, the results obtained using the reconstruction delay estimated with the KSG estimator presented higher average accuracy of the classifier when compared with results from naive estimator. Sensitivity and specificity also presented increased values.
For the classification case PRL x EDM, the results obtained using KSG estimated reconstruction delay presented lower average accuracy of the classifier when compared with results from the naive estimator, the accuracy obtained value was 43,44% from 55,56% of the naive estimator. However, the variance in this case was lower for all criteria with the KSG estimator.

Discussion
Discrimination between voices pathologies using nonlinear measurements is still not much explored. Some papers use several non-linear combined measures such as Costa et al. (2013) who combine 8 nonlinear measurements in healthy x pathological classification. They use the same database presented in this paper. In the classification between pathological voices the authors combine the 8 nonlinear measurements with LPC (Linear Predictive Coding). Pinho et al. (2016) use great, medium, minimum and maximum values of τ to resconstruct phase space and use image measurements for discrimination between healthy x pathological voices. Henriquez et al. (2009) use 4 non-linear measurements combined to discriminate between healthy x pathological voices. In this paper we used a single nonlinear measurement both in healthy x pathological voices and classification between pathological voices. We presented the reconstruction delay as a parameter applied in order to detect the presence of voice pathologies or in order to discriminate pathologies. This reconstruction delay was estimated by a new method of estimation named KSG, which has presented less biased estimates when compared with the naive estimator usually adopted.
The choice of a more accurate estimator makes the classifier more efficient, since KSG estimation approaches a value that better fits the characteristics of a true reconstruction delay. It is a well known fact that a too short reconstruction delay will not capture the dynamics of the data, while a too large reconstruction delay will make it completely independent in a statistical sense (Abarbanel, 1996). The reconstruction delay estimated with KSG method captures the dynamics of voice data more reliably, which in turn allows improved performance of classification. This was observed in the results of this paper. The reconstruction delay estimated with KSG method diagnosed more accurately in most cases.
For the first analyzed case, discrimination between healthy voices versus pathological voices, the classification with KSG estimator presented improved performance when compared to classification with the naive estimator, for the criteria accuracy and sensitivity.
The comparison between heathy voices and voices with some particular pathology (the second classification case, Healthy x Edema, Healthy x Paralysis and Healthy x Nodule) showed that KSG estimator once again presented improved performance over the naive method.
In classification between pathology voices, the results obtained with KSG estimator, for classification cases Edema x Nodule and Paralysis x Nodule, confirm the superiority of this method. Only for the classification case Paralysis x Edema the results with naive estimator were better than using KSG estimator. This fact may have occurred by some particular characteristic of the signals affected by paralysis and edema. As a matter of fact, the statistical test showed that the KSG estimated τ for both classes (paralysis and edema) were not significantly different. This explains why the discriminatory potential of the reconstruction delay is reduced in this particular case.
We stress that the results obtained in this paper are interesting for the case of detection of a disease as also for the case of classification between pathologies. The classification between pathologies is still an unexplored subject in the literature.
Finally, from the present work, we conclude that the use of KSG estimation for the reconstruction delay on phase space is a promising technique, which can be considered to improve the diagnosis of voice related pathologies.