Accuracy of traditional and formant acoustic measurements in the evaluation of vocal quality

Accepted: April 09, 2018 Study conducted at Departamento de Fonoaudiologia, Universidade Federal da Paraíba – UFPB João Pessoa (PB), Brasil. 1 Departamento de Fonoaudiologia, Universidade Federal da Paraíba – UFPB João Pessoa (PB), Brasil. 2 Programa de Pós-graduação em Linguística, Universidade Federal da Paraíba – UFPB João Pessoa (PB), Brasil. Financial support: National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico CNPq). Process no 480168/2013-0. Conflict of interests: nothing to declare. ABSTRACT


INTRODUCTION
The voice is essentially a multidimensional phenomenon that includes physiological, perceptual, aerodynamic, acoustic and emotional aspects. Therefore, it is necessary that voice evaluations also follow this principle and that these dimensions are considered and integrated in the process to achieve an overall view of dysphonia (1) .
The goal of voice evaluation is to analyze voice quality, identify whether the voice is healthy or not, diagnose the presence of a perturbation, determine a prognosis, and monitor the patient's progress during voice therapy (2) . The process of voice evaluation generally includes procedures relating to a visual laryngeal examination, auditory-perceptual voice evaluation, acoustic analysis, aerodynamic evaluation and voice self-evaluation (1) .
Auditory-perceptual analysis is considered the primary reference standard used by the speech therapist when performing voice evaluations (2) . It is considered a subjective method, as it depends on the evaluator's judgment and has an exclusively impressionistic nature (2,3) . This type of evaluation provides information about the characterization of voice deviation intensity, as well as the predominant voice quality (4) .
Acoustic analysis is a more objective procedure. It is noninvasive and is becoming increasingly used in the voice clinic. In traditional acoustic analysis, two types of measure are used, perturbation measures (jitter and shimmer) and noise measures. Jitter indicates the variability of the fundamental frequency in the short term and is measured between neighboring glottal cycles. Shimmer corresponds to variability in the sound wave amplitude over the short term. Glottal-to-noise excitation (GNE) measures the additional noise in the sound signal, irrespective of the noise modulated by the glottal mechanism, indicating the source of the voice signal and whether it comes from vocal fold vibration or from turbulent airflow generated in the vocal tract. Measures of the perturbation and noise are therefore focused on the glottal source (3)(4)(5) .
In addition to these measures, some measures are related to the resonance of the sound wave in the vocal tract, which changes according to the different configurations of the vocal tract structure positioning and volume of the resonance cavities during voice production. Such measures are called formants and correspond to energy concentrations along the vocal tract (3)(4)(5)(6) .
The vocal tract has a three-dimensional configuration and the sound that is produced in the glottis is modified by the positioning of structures such as the larynx, soft palate, tongue, lips and jaw. The frequencies of the glottal signal that are reinforced by the supraglottic vocal tract are called formants, and their analysis provides information about adjustments being made in the supraglottic vocal tract (6)(7)(8)(9)(10) .
Adjustments in the positioning of the articulators and in the volume of the resonance cavities determine the values of formants (6)(7)(8)11) . Thus, an increase in the first formant (F1), for example, is related to a downward jaw adjustment, anterior lowering of the tongue and pharyngeal narrowing. An anterior adjustment of the tongue which is then lowered generates an increase in the second formant (F2). The formation of a smaller cavity immediately behind the incisors can raise the value of the third formant (F3) (6)(7)(8)10,11) .
In this context, there is a strong interaction between the source producing the sound (glottis) and the filter. The feedback from pressure encountered by the sound wave in the vocal tract modifies the glottal airflow and vocal fold vibration mode (12) .
Thus, these adjustments may be related to the development or maintenance of, or may cooccur with, voice disorders (11,13) . Such adjustments are not necessarily evaluated by traditional acoustic measures, as they focus on the glottal source (16) .
Notably, acoustic analysis does not replace auditory-perceptual analysis but rather integrates the auditory and physiological levels (6)(7)(8) . A combination of acoustic and perceptual auditory measures increases the accuracy in determining the presence or absence of a voice disorder and the intensity of the deviation present (17,18) .
For this reason, it is important to investigate whether a combination of measures relating to the source (perturbation and noise) and filter (formantic measures) allows a better classification of voice signals in regard to deviation intensity and predominant voice quality.
This study therefore aims to investigate the accuracy of both isolated and combined traditional acoustic and formantic measures in the discrimination of the voice deviation intensity and predominant voice quality in dysphonic patients. To carry out this study, we start from the hypothesis that a combination of traditional acoustic and formantic measures will improve the discrimination of voice deviation intensity and that a combination of traditional acoustic and formantic measures can improve the discrimination between different predominant voice qualities.

Study design
This was a descriptive, cross-sectional, observational study, evaluated and approved by the Ethics Committee of the Health Sciences Center, Federal University of Paraíba (UFPB), under protocol number 52492/12. All participants signed a free and informed consent form authorizing the study.

Sample
Patients treated at the Department of Speech Therapy's Voice Laboratory (UFPB) in the period between April 2012 and July 2015 participated in this study. The following eligibility criteria were considered for participation: • Being female, given the relationship between this variable and the mean F 0 measure, which is associated with the anatomical characteristics of the vocal folds, which are unequal between adult males and females (16) . Furthermore, there is a higher prevalence of voice disorders in this population (19) ; • Being over 18 and below 65 years of age, thus avoiding the periods of voice change and presbyphonia, respectively; • Presenting a voice complaint, answering positively to the following question: "Do you consider that you have a voice problem now or have had one during the past six months?"; • Having undergone a laryngeal visual examination and having an otorhinolaryngological report.
Of the total of 530 patients evaluated in the laboratory, 96 were male, 75 were under 18 or over 65 years of age and 57 individuals had no voice complaints. Thus, 228 individuals were excluded because they did not meet the eligibility criteria, leaving a final sample of 302 patients with a mean age of 39.25(±12.63) years. No patient had neurological or cognitive impairments that prevented voice recording.

Procedures
All data collection for this study was conducted in the Department of Speech Therapy's Voice Laboratory (UFPB) during the initial voice evaluation session. During this session, the patients were evaluated by means of a form containing questions relating to personal information and voice complaints. They completed voice self-evaluation questionnaires and underwent the recording of speech tasks.
Only the personal identification, voice complaint and sustained vowel sample data were used for this study, as described later.
The voices were collected in a recording booth with soundproofing and a noise level below 50 dB SPL, with a 44000-Hz sampling rate at 16 bits per sample and a 10-cm distance between the microphone and the patient's mouth. Fonoview software, version 4.5, CTS Informática was used on a Dell all-in-one desktop, with a Senheiser E-835 unidirectional cardioid microphone located on a pedestal and coupled to a U-Phoria UMC 204 Behringer preamplifier.
For the voice recording collection, the patient remained standing facing the pedestal at the recommended distance between the mouth and microphone. The patient received instructions about the voice collection, and recording began soon after. During the recording, the patient was asked to emit the sustained /Ɛ/ vowel at a frequency and intensity self-reported as comfortable and normal. The /Ɛ/ vowel was selected for this study because it is an oral, open vowel, is not round and is considered to be the vowel with the most average position in Brazilian Portuguese, which facilitates a more neutral and intermediate position of the vocal tract. In addition, it is the most commonly used vowel for evaluating voice quality in Brazil (20) .
Subsequently, the voices were edited using SoundForge software, version 10.0. The first and final two seconds of the vowel emission were removed due to the greater irregularity in these sections, with a minimum time of three seconds being retained for each emission. The signals were normalized for the auditory-perceptual evaluation, using SoundForge's "normalize" control in peak level mode, to standardize the audio output at between -6 and 6 dB.
The acoustic measures of the fundamental frequency (mean and standard deviation), jitter, shimmer and glottal-to-noise excitation (GNE) were extracted manually using the voice quality analysis module of VoxMetria software, version 4.7h (CTS Informática, Pato Branco, Paraná, Brazil). The reference values in that software for the jitter, shimmer and GNE parameters are 0.6, 6.5 and 0.5%, respectively. Values greater than those cited for the jitter and shimmer are considered deviated, while values lower than that cited for the GNE may be considered deviated.
Praat software, version 5.3.77h, was used to extract the formantic measures from the vowel's representation in a broadband spectrogram containing the first three formants (F1, F2, and F3). Due to the large number of estimations involved, a script was used (a tool that automatically extracts, in a standardized manner, the parametric measures investigated), which facilitated the optimization of processing time and avoided possible handling errors during the estimation procedures. The means and standard deviations of the formant frequencies were extracted for each sample. All values were then checked, and no outliers were identified.
The auditory-perceptual evaluation was performed independently by three speech therapists who were voice specialists with over 10 years of experience in this type of analysis. A visual analogue scale (VAS) ranging from 0 to 100 mm was used (21) to evaluate the voice deviation intensity (general grade [GG]), of the sustained vowel. A score closer to 0 represents a lower voice deviation, and one closer to 100 a greater voice deviation.
Before the auditory-perceptual evaluation, eight sustained /Ɛ/ vowel anchor stimuli were used for the training of the judges. These contained two samples of individuals with normal voice quality variability (NVQV), two samples of individuals with mild to moderate voice deviations, two samples of individuals with moderate voice deviations and two samples of individuals with intense voice deviations. All the files presented contained female voices. The judges were asked to listen to the anchor stimuli immediately prior to analyzing the voices for this study. All samples selected for this training were previously analyzed by speech therapists with experience in voice analysis and were routinely used for perceptual auditory training and as anchor stimuli in the laboratory where this study was conducted.
The perceptual evaluation session took place in a silent environment. First, each judge was told that the voices should be considered as having NVQV when they were socially acceptable, produced naturally, and without effort, noise or unstable conditions during emission. They were also instructed that roughness would correspond to the presence of vibratory irregularities, breathiness would be related to the audible escape of air during the emission and tension would correspond to the perception of vocal effort during the emission.
The auditory-perceptual parameters of roughness, breathiness and tension were chosen to characterize the signals in this study because they are universally used to characterize voice quality deviation (2) and because they have known correlates on the physiological and acoustic planes.
For the evaluation, each sustained vowel emission was presented three times through a speaker at a comfortable intensity as self-reported by the evaluators. The judges then identified the presence or absence of voice deviation, the predominant voice quality in the deviated voices (rough, breathy or tense) and, finally, made a judgment as to the voice deviation intensity.
The VAS was subsequently converted into a numerical scale, with values from 1 to 4, wherein grade 1 represented individuals with NVQV (0-35.5 mm), grade 2 represented subjects with mild to moderate deviation (35.6 to 50.5 mm), grade 3 represented a moderate deviation (50.6 to 90.5 mm) and grade 4 represented an intense deviation (> 90.5 mm) (22) .
At the end of the auditory-perceptual evaluation, 10% of the samples were randomly repeated to evaluate the reliability of the judges' analysis using Cohen's kappa coefficient. The auditory-perceptual analysis results of the judge with the greatest reliability (kappa coefficient of 0.79) were selected for use in this study. The other two judges had kappa values of <0.70.
The patients were categorized into two groups according to the auditory-perceptual analysis results as follows: 33 patients with NVQV (GG≤35.5 mm) and 269 patients with voice quality deviations (GG≥35.6 mm). Of the patients with voice quality deviations, 150 were classified as mild to moderate (35.6≤GG≤50.5 mm), 112 as moderate (50.6≤GG≤90.5 mm) and 7 as having an intense deviation (GG> 90.5 mm). Of the 269 patients with voice quality deviations, 135 (50.18%) had a predominantly rough voice quality, 95 (35.31%) had a predominantly breathy voice quality and 39 (14.49%) had a predominantly tense voice quality.
The otorhinolaryngological reports of the 33 NVQV patients showed voice complaints and a lack of structural and functional laryngeal changes. Of the 269 patients with voice quality deviations, all had voice complaints; 30 had a medical diagnosis of an absence of structural and functional laryngeal changes, and 239 were diagnosed with laryngeal changes, as described above.
This sample characterization is consistent with the literature, as there is no direct relationship between the presence of a voice complaint, the presence of voice quality deviation and the presence of laryngeal changes (5) . Therefore, given that the purpose of this study was not to evaluate the acoustic parameters according to the presence or not of a speech disorder but to clarify the relationship between auditory-perceptual parameters and acoustic measures in evaluating the intensity and type of voice deviation, we decided not to exclude individuals with voice complaints but no laryngeal changes. These criteria strengthen the internal validity of the study and ensure that the independent variable (auditory-perceptual evaluation) is the only or most likely explanation for the effect on the dependent variable (acoustic parameters).

Data analysis
Descriptive statistical analyses were performed for all variables, including the mean and standard deviation values. Quadratic discriminant analysis (QDA) was performed to classify the signals as a function of the GG and predominant voice quality, with K-fold cross-validation used as an auxiliary method.
QDA was selected for this study because it allows identifying individual and combined variables that best discriminate between pre-established groups (GG and predominant voice quality). Eight acoustic measures were analyzed in the combined measure analysis and were combined 2 by 2, 3 by 3, 4 by 4, up to 8 by 8.
In the K-fold cross validation method, the classification was performed ten times, varying the data set, which is used for training and testing without repetition, so that more accurate results can be obtained (22) . Thus, signals with different GGs and predominant voice qualities were randomly divided into subsets, with a minimum of 10 signals in each subset, as this minimum number of signals facilitates the best error estimates. Signals with strong deviations were excluded from the analysis because they did not satisfy the condition of having a minimum of 10 signals.
These subsets were compared by the means of the cross-validation procedure, and for each iteration between subsets, performance measures (accuracy, sensitivity and specificity) were obtained for the classifier when discriminating the GG or predominant voice quality. At the end of all subset iterations, the mean and standard deviation values of the formed subsets were extracted and used to interpret the final classifier data.
Accuracy, sensitivity and specificity measures were used to evaluate the classifier's performance. In general, the interpretation of the sensitivity and specificity measures is most evident when the groups being compared belong to a healthy (no changes) or pathologic (with changes) class (23) . Therefore, when performing discriminant analysis between classes with changes, such as performed in this study (when different deviation and predominant voice quality intensities were compared), it is necessary to determine in the classifier used the signal group that will have its correct classification measured by the sensitivity and the group that will have its correct classification measured by the specificity.
Therefore, a standard procedure was adopted in which the first condition presented in each table would correspond to the signal that would be classified correctly by the specificity, while the second condition would be classified correctly by the sensitivity (Chart 1).
The classification performance took into account signals with different GGs and different predominant voice qualities. The individual power of each of the considered acoustic measures and possible combinations of these measures were also considered, identifying those that provided the best classification rates between voice signals under the conditions established in this study.

RESULTS
Tables 1 and 2 show the means and standard deviations of the acoustic measures as a function of GG and predominant voice quality, respectively. These data will not be examined separately but in conjunction with the performance of the classifications used.
First, the accuracy of the isolated acoustic measures in discriminating the GG in the patients was tested. The GNE measure had the best performance (70.95%, SD = 3.05), achieving a sensitivity of 86.67±5.44% and specificity of 55.83±5.13% (Table 3).
When investigating the discriminatory power of the combined acoustic measures in the classification of GG in the investigated sample, the greatest accuracy was found in the following combinations: the means of F 0 , F2 and GNE (75.24±4.86%) when distinguishing between NVQV and mild to moderate deviations; and the SDs of F 0 , F1, F3, jitter and GNE (74.02±3.26%) when discriminating between mild and moderate deviations (Table 3).  The accuracy of the isolated measures in the discrimination of predominant voice quality was analyzed next. GNE performed best in discriminating between NVQV and rough (73.57%±5.56), between NVQV and breathy (82.38±3.73%) and between breathy and tense (71.43%±4.76) (Table 4).
Finally, the performance of the combined acoustic measures in the discrimination of the voice quality was tested. The means of F 0 , shimmer and GNE (78.57±4.21%) were the best combination when discriminating between NVQV and rough voice quality. The means of F3 and GNE (84.05±3.29%) were the best combination for distinguishing between NVQV and breathy voice quality. The means of F 0 , F3, and GNE (73.75%±3.75) were selected as the best combination for discriminating between rough and tense voices. The combination of the means of F 0 , F1 and GNE (75.71±6.41%) offered the best performance when discriminating between breathy and tense voices ( Table 4).

DISCUSSION
This study investigated the accuracy of both isolated and combined traditional acoustic and formantic measures in the discrimination of GG and predominant voice quality in dysphonic patients. Two hypotheses were raised as follows: 1) the combination of traditional acoustic and formantic measures improves the discrimination of GG in voices, and 2) the combination of traditional acoustic and formantic measures improves the discrimination of different predominant voice qualities. Thus, the discussion section was organized to elucidate the conclusions reached with regard to these hypotheses.

Traditional acoustic and formantic measures in the discrimination of voice deviation intensity
When analyzing the isolated acoustic measures, only GNE showed acceptable performance (70.95±3.05%) in the discrimination between NVQV voices and voices with mild to moderate deviations, with higher sensitivity (86.67%±5.44) in the correct identification of signals with deviation.
The GNE measure appeared to be lower in patients with mild to moderate deviation than in individuals with NVQV. However, this measure did not produce values in either of the two groups that were below the 0.5% cut-off point considered for the presence of deviation in this parameter. In turn, in the comparative analysis, it could be inferred that patients with mild to moderate voice deviation had more silent airflow between the vocal folds than those with NVQV (5,11) .
A study (4) conducted with 226 patients, 53 healthy controls and 173 patients with voice deviations demonstrated that GNE showed excellent accuracy (95%) when differentiating between healthy voices and those with deviations. Thus, it may be inferred that GNE is a good voice evaluation measure because it shows greater discrimination between healthy and deviated voices.
Based on the analysis of the combined acoustic measures, the hypothesis that a combination of traditional and formantic measures would improve the performance of the classifier in the discrimination of GG was confirmed. In addition to increasing the accuracy and specificity values, the combination of measures was able to discriminate between mild to moderate and moderate deviations, which the isolated measures could not. The combination of measures relating to the means of F 0 , F2 and GNE obtained an accuracy of 75.24%± 4.86% when discriminating between signals with NVQV and those with mild to moderate deviations. Patients with mild to moderate deviations had lower GNE values and greater mean F 0 and F2 values than did patients with NVQV.
Lower GNE values may indicate inefficient glottal closure, more additive noise in the voice and a possible decrease in intensity (4,5,24) . In turn, data in the present study in regard to GNE were analyzed comparatively between groups as no values were below the cutoff in either group of signals.
The mean F 0 values found were linked to the presence of longitudinal vocal fold tension, which causes a greater number of glottic cycles per second, resulting in a greater F 0 elevation (25) .
Increased F2 values are related to adjustments in the tongue anteriorization (6)(7)(8) . Such adjustments promote the elevation of the laryngeal complex, and by means of a biomechanical action, there is a greater longitudinal tension in the vocal folds, with a consequent rise in F 0 , increased vocal effort and decreased voice projection (14,25) .
A study (26) analyzed the formantic measures of sustained vowels and found an increase in the values of these measures when the laryngeal complex was elevated. Furthermore, F 0 values decreased when the vocal tract length increased (low larynx) and similarly increased when the vocal tract length decreased (high larynx).
It may be inferred from these findings that compared to individuals without voice quality deviation, patients with a mild to moderate degree of deviation may implement supraglottic adjustments to compensate for dysfunctional glottic conditions with the presence of increased silent airflow. These findings are consistent with other studies (8)(9)(10)(13)(14)(15) showing that dysphonic patients tend to make adjustments in the vocal tract to compensate for their voice problem.
Nonetheless, one can question whether the supraglottic adjustment may be related to the source of the voice problem in these patients as the elevation of the larynx with increased longitudinal vocal fold tension reduces the convexity of the curvature of the free edge of the vocal folds, which is one of the mechanisms responsible for increased transglottic silent airflow (27) .
In general, the description and analysis of the formantic measures in the group with mild to moderate deviations seems to be interesting for understanding the supraglottic adjustments made by these patients, which may have implications for clinical evolution in voice therapy.
The measure combination of the SD of F 0, F1, F3, jitter and GNE also had an acceptable performance (74.02±3.26%) when discriminating between signals with mild to moderate deviation and those with moderate deviation. The measures of the SD of F 0 , F1, F3 and jitter were higher in patients with moderate deviation, while the GNE values were lower in these patients than in individuals with mild to moderate deviation. In regard to the reference values for the GNE and jitter measures, only the latter produced values above the cutoff point for being considered deviated.
In physiological terms, the SD of F 0 is directly related to the neuromuscular condition and vocal fold mucosa vibration regularity; thus, higher F 0 SD values, as found in patients with moderate deviations, may indicate phonatory instability and greater vocal fold vibration irregularity, thereby causing deviations in voice production (24,25) .
Jitter evaluates perturbations in the frequency of the neighboring vibration cycles (11,18) and is the measure most correlated with GG (17) and sensitive to the presence of voice deviations. This explains its increase in individuals with moderate voice deviations in this study.
These data suggest that patients with moderate voice deviations have more irregular vocal fold vibrations and phonatory instability (increased SD of F 0 ), greater silent airflow, more noise in the voice (decreased GNE) and a greater overall intensity of voice deviation (increased jitter) than do patients with mild to moderate deviations.
The increase in F1 values is related to the greater lowering of the oromandibular complex and to oropharyngeal narrowing (6)(7)(8)10,11) . These cited supraglottic adjustments may occur as a compensation for dysfunctional glottic conditions, as a greater degree of jaw opening and pharyngeal narrowing may cause a decrease in auditorily perceived breathiness (27) and increased voice intensity (8)(9)(10)17) . An increase in F1 is also associated with the phonatory effort present in dysphonic patients with muscular tension (14) .
The hypothesis that a combination of traditional acoustic and formantic measures can improve discrimination in regard to GG was confirmed. The information seems to have a complementary nature, as formantic measures alone did not show acceptable discriminatory performance in the cases studied. Notably, in this study, an auditory-perceptual rating scale focused on the glottal source was used; therefore, one would expect a greater contribution from acoustic measures related to the glottal source.
However, more deviated voices seem to make greater supraglottic adjustments, as the higher values found in the combination of measures would be related to sensitivity, i.e., indicate the most deviated signals correctly.

Traditional acoustic and formantic measures in the discrimination of predominant voice quality
When analyzing acoustic measures alone, only GNE had an acceptable performance when discriminating between voices in terms of predominant voice quality.
In regard to the discrimination between NVQV and rough voices, an accuracy of 73.57±5.56% was found, with greater sensitivity (88.33±4.84%) for the correct identification of rough voices. Regarding the NVQV vs. breathy discrimination, an accuracy of 82.38±3.73% was found, with greater sensitivity (87.50±5.16%) in the correct identification of breathy voices. In the breathy vs. tense discrimination, an accuracy of 71.43±4.76% was found, with greater specificity (81.67±4.08%) in the correct identification of breathy voices.
Once again, in an isolated form, only the GNE measure showed acceptable values in the discrimination of the different voice qualities. In this context, GNE proved especially important in differentiating breathy voices from other voice types. This finding is probably because GNE is directly related to the source of the voice signal, i.e., whether it comes from vocal fold vibration or turbulent airflow generated in the vocal tract (4,5) . This factor could explain the direct relationship with this parameter.
The hypothesis that a combination of traditional acoustic and formantic measures can improve the discrimination of predominant voice quality was confirmed, as the combination of these measures improved the performance of the classifier when discriminating between NVQV and rough, NVQV and breathy and breathy and tense voices. It also provided acceptable discrimination between rough and tense voices.
When discriminating between NVQV and rough voices, the best combination found was the measures of the means of F 0 , shimmer and GNE. This combination had an accuracy of 78.57±4.21% and greater sensitivity (87.50±5.16%) in the correct identification of rough voices. The means of F 0 and shimmer values were higher in patients with a rough voice quality, while the GNE values were reduced in relation to VNQN voices.
In general, it is expected that rough voices will have lower F 0 values (18) . However, the increase of this measure in this study may be explained by the fact that patients with rough voices possibly had tension associated with emission and that, therefore, there was an increase in F 0 (2,14,28) compared to patients with NVQV.
Shimmer is a measure related to the variation in amplitude between adjacent cycles and is thus related to vibratory irregularity and glottic resistance (4,29) . On the auditory-perceptual plane, previous studies have shown that shimmer is related to roughness (17,18) . The shimmer values in this study contributed to the correct identification of rough voices. It should be noted that although the shimmer values are most deviated in voices with roughness, these values are still within the normal range, given the cutoff values adopted.
The objectives of one study (18) included an analysis of the discriminatory power of acoustic measures when classifying deviation intensity and differentiating predominant voice types. A total of 186 dysphonic patients participated in the study. The measures used were the fundamental frequency (F 0 ), jitter, shimmer and GNE. The results showed that the shimmer and GNE were useful in detecting rough and breathy voices, respectively.
Data from the aforementioned study (18) appear similar to those found in the present one as shimmer was correlated with the roughness parameter and GNE in this study. Although appearing in all combinations, shimmer seemed to be more sensitive in relation to voices with a breathy quality.
The F3 and GNE measures were selected as the best combination when discriminating between NVQV and breathy voices (84.05±3.29%) and had high sensitivity (90.00±5.09%) in the correct identification of breathy voices. Patients with breathy voices had higher F3 values and lower GNE values .
The F3 frequency is related to the two cavities established by the tongue position, that is, the cavity behind the tongue constriction and the one in front of it. The F3 frequency can also be affected by adjustments to the lips, larynx and pharynx, and it has a tendency to decrease with labiodentalization adjustment and lip rounding and to increase with constriction around the pharynx (3,10,11,20) . Thus, one can infer that patients with a predominantly breathy voice quality have a greater constriction around the pharynx and more stretched lips, probably as compensatory mechanisms to the increase voice intensity.
The findings of this study reinforce the fact that the GNE measure is strongly related to the breathy voice quality (4,5,18,28) and is the only isolated measure with acceptable accuracy when discriminating between NVQV and breathy signals.
When discriminating between rough and tense voices, the best combination found was the measures of the means of F 0 , F3 and GNE (73.75±3.75%), and this combination had greater specificity (84.17±5.75%) when identifying rough voices. The mean F 0 was lower in patients with roughness than in those with tense voices, F3 had higher values in patients with roughness, and the GNE was higher in patients with a tense voice quality.
The findings suggest that patients with a tense voice quality may have greater longitudinal tension in the vocal folds due to the higher mean F 0 values. Furthermore, it appears that patients with roughness have a smaller cavity in the vocal tract due to the increase in F3 (11,13) and that patients with a tense voice quality seem to have less noise in the voice (4,5) than patients with roughness, an aspect suggested by the fact that the GNE is less deviated in tense voices.
The rough vs. tense discrimination category appeared only when there was a combination of measures and there was no acceptable isolated value. This demonstrates the importance of finding the best combination of formantic measures to identify voice quality (4,24) . The measures relating to the mean F 0 , F1 and GNE were selected for discrimination between breathy and tense voices, with an accuracy of 75.71± 6.41% and with higher specificity (78.33±8.16%) in the correct identification of breathy voices. The F 0 and F1 values were greater in patients with tense voices, and the GNE was lower in patients with breathy voices.
Regarding the mean F 0 and tense voice quality, it is important to note that the fundamental frequency is determined, among other factors, by vocal fold tension, which is controlled by the intrinsic laryngeal muscles, specifically the cricothyroid (2,11,15) . Thus, patients with vocal tension usually exhibit greater contraction of the extrinsic and intrinsic muscles, including greater longitudinal vocal fold tension, greater subglottic pressure and greater vocal tract constriction, generating a larger number of glottic cycles per second and hence a greater fundamental frequency (25) .
The general grade and roughness seem to be parameters more related to F 0 (28,30) . The mean F 0 values are higher both in general grade and in vocal tension, and the F 0 standard deviation values are also high in rough voices. This study's findings seem to agree in regard to the increase in F 0 in patients with tense voices and the positive relationship between F 0 and the general grade of voice deviation.
In relation to the increased F1 values, it would seem that patients with a tense voice quality may make adjustments in the vocal tract, having a larger vertical opening of the mouth and greater pharyngeal constriction (6)(7)(8)10,11) than patients with a breathy voice quality.
A study (14) conducted with 111 women with muscle tension dysphonia found similar results. The F1 and F2 formants were elevated in this population compared to those with healthy voices, suggesting adjustments in the supraglottis relating to a greater vertical opening of the mouth, greater pharyngeal constriction and a lower and more anterior tongue position. The adjustments found in that study are similar to those of the present study in regard to the greater vertical opening of the mouth and increased pharyngeal constriction as indicated by an increase in F1 in patients with a tense voice quality.
Analysis of the combined acoustic measures in the discrimination of the predominant voice quality again revealed that the GNE measure appeared in all acceptable combinations found. The F 0 measure was present in most of the combinations when discriminating predominant voice quality, which attests to the results found in previous studies (8,13,18,29,30) in which the fundamental frequency appeared to be an interesting measure when discriminating voice quality. This is probably because it is related, in physiological terms, to the neuromuscular condition and vocal fold mucosa vibration regularity, and in acoustic and perceptual terms, it is directly related to the sound signal periodicity (6,9,11,30) .
In summary, a combination of perturbation/noise measures and formantic measures promotes a slight improvement (75.24%) in the classification rate between voices with NVQV and those with mild to moderate deviation in relation to the GNE measure alone (70.95%). This combination also facilitates discrimination between voices with mild to moderate and moderate deviations, which was not observed with isolated measures. These findings offer evidence that the greater the voice deviation intensity, the more complex the signal in terms of the aperiodicity and noise. Such intensities therefore require a combination of measures to characterize them adequately.
Furthermore, a combined analysis of measures relating to the glottal source (perturbation and noise) and filter (formantic measures) contributes to a broadening of our understanding of source-filter interaction mechanisms in deviated voices and may be useful when measuring the results of treatment and monitoring during voice therapy. The fact that more formantic measures (F1 and F3) were selected by the classifier for discriminating more deviated voices shows that individuals with more intense deviations make more vocal tract adjustments, probably as a compensatory mechanism in response to the functional inefficiency of the glottal source.
In regard to the predominant voice quality, the formantic measures proved important when classifying between NVQV and breathy (F3), rough and tense (F3) and breathy and tense (F1) voices. Specifically, the formantic measures seem to provide a greater contribution to the discrimination of the auditory-perceptual parameter tension. Individuals with tense voices probably make more supraglottic adjustments, either for compensatory reasons or in cooccurrence with the alterations at the glottic level.
The presence of a voice disorder tends to change the voice signal in different ways and may combine various types of perturbation and noise in vocal emissions as well as possible supraglottic adjustments. The combined use of measures for the evaluation, characterization and classification of the voice signal may therefore better represent voice production characteristics and highlight manifestations that would not be detected with the use of isolated measures. Other studies (3,28,30) have shown that a combination of perturbation and noise measures improves the discrimination between signals with and without voice deviations. However, in terms of this study, it may be concluded that combining measures related to vocal tract adjustments with traditional perturbation and noise measures can improve the classification of the voice deviation intensity and type and provide insights into the source-filter interaction in patients with voice deviations.

CONCLUSION
The GNE acoustic measure was the only one able to discriminate voice deviation intensity and predominant voice quality in isolation. There was a gain in the classification performance when traditional acoustic and formantic measures were combined in the discrimination of both the voice deviation intensity and predominant voice quality.