Influences of the signal border extension in the discrete wavelet transform in EEG spike detection

Introduction: The discrete wavelet transform is used in many studies as signal preprocessor for EEG spike detection. An inherent process of this mathematical tool is the recursive wavelet convolution over the signal that is decomposed into detail and approximation coefficients. To perform these convolutions, firstly it is necessary to extend signal borders. The selection of an unsuitable border extension algorithm may increase the false positive rate of an EEG spike detector. Methods: In this study we analyzed nine different border extensions used for convolution and 19 mother wavelets commonly seen in other EEG spike detectors in the literature. Results: The border extension may degrade an EEG spike detector up to 44.11%. Furthermore, results behave differently for distinct number of wavelet coefficients. Conclusion: There is not a best border extension to be used with any EEG spike detector based on the discrete wavelet transform, but the selection of the most adequate border extension is related to the number of coefficients of a mother wavelet.


Introduction
Epilepsy affects more than 50 million people around the world (World Health Organization, 2015).The term epilepsy refers to a group of symptoms related to many causes and it is not a specific disease, where excessive discharges of groups of brain cells leads to sudden recurrent and transient mental functions and/ or movements of the body (Kalayci et al., 1994).EEG abnormalities specific to epilepsy are ictal and interictal activity.Ictal activity occurs during epileptic seizures.Interictal activity consists of random spike and sharp waves, occurring from every few hours to every few seconds (Gotman, 1981).Spike and sharp waves have duration from 20 to 200 ms (Chatrian et al., 1974;Noachtar et al., 1999), and their recognition in EEG exams is exhaustive and time consuming.Despite the effort of researchers in over two decades, the epilepsy diagnosis still does not count on a reliable automated process.
A technique used by researchers is based on the use of the wavelet transform (WT) (Daubechies, 1990;Mallat, 1989) in its discrete implementation, the discrete wavelet transform.It is a powerful technique for processing EEG signals using recursive convolutions of specific coefficients with epochs of the EEG signal to extract features that can be used to classify these events as a spike or normal brain activity.The closer the similarity between the mother wavelet and the signal under analysis, the better the performance of a detector using this mother wavelet.
The discrete implementation of the wavelet transform (DWT) is applied to a finite number of signal samples, even for long-term signals or continuous sampling.Each signal segment requires the treatment of the signal border for recursive convolution calculation (Mallat, 1989).The border extension is a procedure that aims to mimic the signal behavior outside the signal sample window, attaching extra values at both sides of the signal window to smooth edge distortions.The border extension algorithm is performed for every decomposition level of the discrete wavelet transform, with its execution being unavoidable.
During the implementation of a proprietary EEG spike detector, it was noticed that each algorithm used to extend the EEG borders modified significantly the obtained DWT coefficients.Preliminary results showed that these variations affected the spike classification since they change the features of the WT decompositions, such as its maximum and minimum values, its energy and its approximate entropy value (Pacola et al., 2012).
The distortions generated by the border extension are well known in areas such as mechanical engineering and image processing (Katunin, 2012;Montanari et al., 2015;Su et al., 2012).Quandt et al. (2015) reported border distortions for respiratory signals.However, for EEG signal, no report on distortions produced by border extension was found.
To assess the differences in performance of different border extensions, the results of a detailed study comparing 19 wavelets and 9 border extension over 494 EEG spike events and 1500 EEG non-spike events and the best combinations of them are presented.

EEG signal database
The EEG spike and non-spike events were selected from a database by a neurologist from Pequeno Príncipe Hospital, in Curitiba, PR, Brazil.The EEG signal was recorded using scalp electrodes in international 10-20 system, recorded at 200 samples per second.The 494 spike events and the 1500 events containing normal brain activity (non-spike events) were collected from different parts of seizure free sections from different non-correlated channels.These events were recorded as epochs with duration of 2 seconds.The spike events are centered within the sample window.The spike events collected range from 60 ms to 200 ms.This research has been approved by ethics committee (CEP/HPP 1104-12).
The wavelet transform was performed by its discrete implementation in five decomposition levels.Because the EEG signal was recorded at a sampling rate of 200 Hz, these five decompositions contain the gamma, beta, alpha, and theta waves spectra (Ercelebi and Subasi, 2006;Shuren and Zhong, 2009).
For this study, the Approximate Entropy (ApEn) was used as a signal feature descriptor.The ApEn is an algorithm that measures the complexity of the EEG spike event.This algorithm was introduced by Pincus (1990) and was used in many studies in the literature to enhance EEG spike detection (Kumar et al., 2014;Ocak, 2009;Vavadi et al., 2010;Wang et al., 2009).The smaller its value, more regular and predictable it is a sequence of samples.The ApEn input parameters were m = 2, and the considered deviation was set to 15% of the input vector standard deviation.The ApEn was calculated for the five decomposition levels resulting of the DWT.

Border extension
Nine border extensions are normally used during the convolution process: ZPD, SP0, SP1, PPD, PER, SYMH, SYMW, ASYMH, and ASYMW.In the ZPD (Zero Padding), the border values of the event are set to zero.SP0 (Smooth-Padding of order 0) replicates the first and last samples of the event.SP1 (Smooth-Padding of order 1) is the first derivative border interpolation of the event.PPD (Periodic-Padding) and PER (Periodic Even Padding) consider that the borders of the event are periodic.SYMH (Symmetric-Padding Half-Point) performs a symmetric replication of half-point of the border values of the event.SYMW (Symmetric-Padding Whole-Point) replicates symmetrically the border whole-point.ASYMH (Antisymmetric-Padding Half-Point) and ASYMW (Antisymmetric-Padding Whole-Point) behave like SYMH and SYMW, but asymmetrically.These signal border extensions are well explained by Misiti et al. (2013).Just as a matter of information, the border extension SYMH is used as default by the MATLAB  wavelet toolbox.
For instance, Figure 1 shows an epoch of a cosine signal to illustrate how the signal border is extended for convolution.The lines in different colors are the signal extended with the 9 different extension borders.In this study, instead of a cosine, it is the EEG event that is extended.
In DWT decomposition, the extra values to be added to the signal being convolved depends directly on the number of coefficients of the high pass and low pass filters.The number of coefficients of each wavelet analyzed in this work is presented in Table 1.In Figure 2b, the edge coefficients are even higher in value than the central coefficients in decomposition levels D1, D2, and D3.Now, considering feature extraction, when calculating the ApEn of each decomposition level, using the same EEG spike event and wavelet transform, different results are obtained due to different signal border extensions used as seen in Figure 2c.

Border distortions
Additionally, Figure 2d presents the differences of the ApEn calculated over the spike and non-spike events on the database where it is possible to see discriminatory information between both classes.

Spike detector
To assess the influence of the border extension in EEG spike detection, the detector structure presented in Figure 3 was used.The detector architecture was entirely written in Java language and no commercial tools were used.
In Figure 3, the EEG events are wavelet transformed in its discrete form according to the border extension algorithm selected and decomposed in 5 decomposition levels in stage 1.After wavelet transformation, the ApEn is calculated for each decomposition level in stage 2. Afterwards, the five extracted features values are then reduced in stage 3 to a single dimension.To achieve this dimensionality reduction, the linear discriminant analysis (LDA) and the Fisher criteria for function maximization are used (Duda et al., 2001).The LDA calculates the mean value and scatter of classes under analysis.Using eigenvectors and eigenvalues, it is determined a projection with reduced dimensions, keeping the best separability between classes.The LDA is a mathematical tool and does not require iterative training.
In stage 4, a threshold is used to classify spike and non-spike events over the unidimensional projection generated by the LDA algorithm.Finally, the detection is assessed in stage 5.

Performance indexes
EEG spike events and non-spike events correctly classified are defined as True Positive (TP) and True Negative (TN), while misclassified EEG spike events and non-spike events are defined as False Positive (FP) and False Negative (FN), respectively (Youden, 1950).
To measure the influence of the border extension in the detection's performance, four indexes are used: Accuracy (Acc), True Positive Rate (TPR), False Positive Rate (FPR) and the area under the ROC Curve (AUC).These indexes are calculated using (1), (2), and (3), respectively.

TP TN Acc accuracy TP FP FN TN
(1) ROC analysis is generally used to investigate the performance of a predictive model in separating positive from negative cases (Lasko et al., 2005).
A defined threshold in a linear classifier results in a pair of TPR and FPR.When varying the threshold value over the continuous range of classified distributions, pairs of TPR and FPR are produced.These pairs are used to plot the ROC curve in a Cartesian graphic.The AUC index is the area under the ROC curve (Erkel and Pattynama, 1998;Fawcett, 2006).
Figure 4 illustrates the construction of the ROC curve and calculation of sensitivity, specificity, Acc, and FPR indexes and their location in the ROC curve.
Through the wavelet transformation an EEG event is decomposed in five levels.For each decomposition level it is calculated the ApEn, resulting in five features.These calculations, i.e., WT with five decomposition levels and their ApEn, are repeated for every event in the database.The LDA takes these 5 features of each event in the database and determines a unidimensional projection containing distributions of spike and non-spike events.From these distributions, by varying the threshold value of the linear classifier, a pair of TPR and FPR is generated.These pairs are then used to draw the ROC curve.
The Acc and FPR index values are extracted from the higher values of sensitivity and specificity, which is the point located at the shortest Euclidean distance between the upper-left corner of the frame to the ROC Curve.

Results
The nine above mentioned border extension algorithms were implemented and the EEG spike detector performance with each implementation was assessed.All 494 spikes events and all 1500 non-spike events were wavelet transformed with 5 decomposition levels for each border extension.The classification results were divided in two groups: Group 1 -wavelets having up to 8 coefficients, and, Group 2 -wavelets having more than 8 coefficients.The performance indexes of the EEG spike detector for each wavelet, as a function of the border extension, are shown in Figure 5.The AUC index in Groups 1 and 2 are presented in Figures 5a and 5b, respectively.Acc results are presented in Figures 5c and 5d, and FPR index is presented in Figures 5e and 5f.
Figure 5 show that, for the same wavelet, the border extension algorithm used to extend the signal affects the performance of the EEG spike detector.This behavior is observed in the three performance indexes used to assess the detector in both Groups of wavelets.The only exception is the Haar wavelet in Group 1, whose indexes did not change with the border extension as the other wavelets did.
In Figure 5, the border extensions that presented the better EEG spike detection were SP0, SP1, SYMH, SYMW, and ASYMW for wavelets having less than 8 coefficients.For wavelets having more than 8 coefficients, the best ones are SYMH, SYMW, and ASYMW.
Considering wavelets having more than 8 coefficients, as seen in Figure 5, SP0 degrades the detector performance.The best border extension for wavelets having more than 18 coefficients varies between ASYMW and SYMH, but still with great differences between them, reaching up to 3.2% for DB20.
The detector performance showed variations in both wavelet groups with any border extension, but border extensions PER, PPD, ZPD, and ASYMH presented the worst performance.
Using the ApEn descriptor the best detector performance was achieved with border extension SP0 with wavelets Haar, DB2, and Sym2, with AUC equal to 0.9332, 0.9300, and 0.9300, respectively.
The highest Acc indexes were achieved using wavelet Sym5 with border SYMW (0.8642), followed by DB2 and Sym2 with SP0 and Sym4 with ASYMW (these last three with accuracy of 0.8541).
The lowest FPR was obtained with wavelets Sym5 with border SYMW (0.1359), followed by DB2 and Sym2 with SP0, and Sym4 with ASYMW (these last three with FPR of 0.1459).
In Figure 5, it is interesting to note that the index used in the evaluation of the detector might vary with the border extension.This behavior is observed with wavelets Haar, DB2, and Sym2.The AUC index of wavelet Haar presented the higher performance with SP1 and SP0 than the wavelets DB2 and Sym2 with the same border extensions, SP1 and SP0.This is different for the results obtained with indexes Acc and FPR.The reason for this is that the Accuracy and FPR indexes take into account just one pair of sensitivity and specificity, and both depend on how the linear classifier is configured.AUC, instead, takes under consideration every pair of sensitivity and specificity used to draw the ROC curve, being less vulnerable to the threshold used in the linear classifier.This is shown in Figure 6, where wavelet DB5 produces a higher AUC index when compared to the use of Sym4.In Figure 6, taking the best pairs of sensitivity and specificity denoted by the crossing lines over the ROC curves, the respective Accuracy and FPR indicates that Sym4 is better than DB5, going against the result obtained with the use of AUC index.Now, considering just the AUC index, the border extension SP0 showed itself as the best extension when used with wavelets having less than 8 coefficients.The SYMH, which is the default Matlab  border extension, degrades results up to 3.64% with these wavelets.
SYMW presented the best results for wavelets sizing 10 and 16 coefficients, while ASYMW proved to be more appropriate for wavelets sizing 20 and 62 coefficients.
ASYMH presented the worst result, regardless of the size of the wavelet, degrading results up to 44.11%. in graphs (a, c and e), and more than 8 coefficients (group 2) in graphs (b, d and f), as described in Table 1.

Discussion
The graphics in Figures 5 show that the signal border extension used in the discrete wavelet transformation has a strong influence in the performance of an EEG spike detector.
The border extension SP0 is the best extension mode for wavelets with up to 8 coefficients.This is important since wavelets with up to 8 coefficients are used in the majority of studies found in the literature.
The wavelet Haar, Figure 5, does not suffer a strong influence of the border extension.The worst border for this wavelet is ZPD, and even so it only degraded the results in 0.09%.As the wavelet Haar has only 2 coefficients, the effects of border extension do not spread over the wavelet decompositions, as with the wavelets having more coefficients.
The default border extension used in Matlab is SYMH.However, for an EEG spike detector, SYMH is not the best option for some wavelets, since it drops the detector performance.In our experiments, the SYMH algorithm dropped the AUC index up to 1.06%; the Acc index dropped 2.9%; and FPR index dropped 8.4%.Considering that a 24-hour EEG exam may have thousands of spike events, this may represent numerous spikes misclassified.
The border extensions PER, PPD, ZPD and ASYMH presented the worst performance in the EEG spike detection.These border extensions are inadequate for this purpose.
The simplicity of implementing the border extension ZPD may suggest its use for a WT based EEG spike detector.However, as we have shown, ZPD is not adequate to extend the EEG signal for an EEG spike detection.
Regarding the database size, the spike events used for this work have duration ranging from 60 ms to 200 ms and were selected by one neurologist.This represents that the data presented in this study may have either mapped the morphology of the database as well the sensitivity and specificity of the neurologist.More data is necessary to complete the full range of spike events (from 20 ms to 200 ms) and selected by various specialists.However, in this case, this work provided means on how to measure the variability of the border effect and the distortions caused by its use.
A final comment is that for an EEG spike detector based on the WT, it is not possible to avoid the use of one of those border extension algorithms presented here.Nevertheless, by choosing the right border extension for the selected wavelet will reduce to a minimal the distortions in the WT results, improving the detector performance.
Overall, this work has shown the importance of selecting the right border extension algorithm to enhance the performance of an EEG spike detector.To the best of our knowledge, this is the first publication to present results of a careful study about the influence of the border extension in an EEG spike detector based on the WT.From the experiments showed in this study, the border extension influences the detection result according to the number of coefficients of the wavelet being used for feature extraction.The wavelets frequently found in the literature used for EEG spike detection have up to 8 coefficients, and for these wavelets, the SP0 border extension is the more adequate.The border extensions PER, PPD, ZPD and ASYMH had the worst performance and they are not adequate for EEG spike detection.The border extension SYMH is adequate for wavelets with 12, 18, 24, 30, and 40 coefficients.SYMW presented the best results for wavelets with 10 and 16 coefficients, while ASYMW proved to be more adequate for wavelets with 20 and 62 coefficients.
The results obtained in this study evaluated the discrete wavelet transform and its inherent border extension algorithm exclusively for EEG spike detection.For signals of different nature and context, it remains open which mother wavelet and border extension is more adequate.

Figure 2
Figure 2 shows an example of the distortions caused by the border extension used in combination with the discrete wavelet transform when dealing with EEG signals.Figures 2a and 2b present the same EEG spike event in an epoch of 2 seconds of duration, where the spike is centered in the sample window.The spike is decomposed into 5 levels by the Daubechies 6 (DB6) wavelet.The difference between Figure 2a, b is that in the first one, the signal border extension SP0 (Smooth Padding of order 0) is used, while in the second one, the signal border extension PER (Periodic Even Padding) is used.It is possible to see that the coefficients near the edge are different.In Figure2b, the edge coefficients are even higher in value than the central coefficients in decomposition levels D1, D2, and D3.Now, considering feature extraction, when calculating the ApEn of each decomposition level, using the same EEG spike event and wavelet transform, different results are obtained due to different signal border extensions used as seen in Figure2c.Additionally, Figure2dpresents the differences of the ApEn calculated over the spike and non-spike events on the database where it is possible to see discriminatory information between both classes.

Figure 2 .
Figure 2. The effect of the border extension in the DWT decomposition.(a, b) The five decomposition levels of the same EEG spike event (top graphics) with DWT DB6, using (a) SP0 and (b) PER as the border extension algorithm; (c) Normalized approximate entropy (ApEn) in the five decomposition levels with the nine described extension border algorithms; (d) ApEn in the five decomposition levels of the selected EEG signal events wavelet transformed with DB6, using SP0 border extension.The star mark shows the respective ApEn for decomposition D3 in (a).

Figure 3 .
Figure 3. Block diagram of the spike detector considering WT based on different border extensions.The feature ApEn is extracted from the WT decompositions of each event in the database and the amount of features is reduced by LDA.The linear classifier separates events between true or false.Performance of the detector is assessed by accuracy, false positive rate (FPR) and the area under the ROC curve (AUC).

Figure 4 .
Figure 4. Spike and non-spike events distributions after LDA processing are presented in red and blue, respectively.The line under both distributions on the left side is the unidimensional LDA output.The green points over the line are overlapping events.On the right side, the respective ROC curve is presented.Points 1, 2, and 3 are threshold levels of the linear classifier with their respective TPR and FPR values.These are results for wavelet DB5 with border extension SP0.

Figure 5 .
Figure5.Behavior of AUC, Acc and FPR indexes for each border extension related to mother wavelets having up to 8 coefficients (group 1) in graphs (a, c and e), and more than 8 coefficients (group 2) in graphs (b, d and f), as described in Table1.

Figure 6 .
Figure 6.ROC curves of wavelets (a) DB5 with border SP1 and (b) Sym4 with border SP1.The values of sensitivity, specificity, accuracy, and FPR, were obtained by the threshold level denoted by the shortest Euclidean distance between the ROC curve and the ideal classifier (TPR=1, FPR=0).

Table 1 .
Wavelet groups and their respective number of coefficients.