Auditory-perceptual evaluation of rough and breathy voices : correspondence between analogical visual and numerical scale

Accepted: May 26, 2015 Study carried out at Centro de Estudos da Voz – CEV São Paulo (SP), Brazil and at the Departamento de Fonoaudiologia, Faculdade de Odontologia de Bauru, Universidade de São Paulo – USP Bauru (SP), Brazil. 1 Programa de Pós-graduação Interunidades em Bioengenharia – EESC/IQSC/FMRP, Universidade de São Paulo – USP São Carlos (SP), Brazil. 2 Departamento de Fonoaudiologia, Faculdade de Odontologia de Bauru, Universidade de São Paulo – USP Bauru (SP), Brazil. 3 Centro de Estudos da Voz – CEV São Paulo (SP), Brazil. 4 Departamento de Engenharia Elétrica, Universidade Federal de São Carlos – UFSCar São Carlos (SP) Brazil. Financial support: none. Conflict of interests: nothing to declare. ABSTRACT


BACKGROUND
The auditory-perceptual analysis is the main vocal assessment procedure used in clinical practice by the speech therapist (1) , able to characterize the voice quality, pointing diverted vocal parameter and the degree of vocal deviation, and inferences about their social impact (2)(3)(4)(5) .On the other hand, the auditory-perceptual assessment has a essentially perceptive character, which makes it subject to mistakes and variations as it can be affected by several factors as evaluator past experience (6) , knowledge or not of the clinical data, type of assessed task -voice or speech (5,7) and protocol used (8) .
Thus, to minimize variation from subjectivity is a challenge to improve the auditory perceptual evaluation.In this context assessment scales were created and validated (9) , among which are numeric and analogical visual scales.The most used scales are CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) and GRBAS (overall dysphonia Grade, Roughness, Breathiness, Asthenia, and Strain) that reveal good reliability indexes (10)(11)(12) , while continuous scales (analogical visual) show better concordance indexes than numeric scales (10,11,13) .
On the other hand, it's been observed that analogical visual scales usually are linked to qualitative restrictions of the degree of vocal deviation (14) .ASHA (American Speech-Language-Hearing Association), in text about the CAPE-V, recommends the use of value in millimeters associated to the qualitative description to degree the vocal deviation (e.g.: mild, moderate, severe) (15) .Recent studies have been established cutoff values in this context for normal variability of mild, moderate and severe level (10,16,17) .Therefore, these studies focus on the overall degree of vocal deviation, lacking information regarding the cutoff values for specific parameters.
Roughness, breathiness and tension are among the specific vocal parameters most found in dysphonic individuals (18) .In 1996, studies already showed the breathiness and roughness as two characteristics clearly identified in the auditory perceptual evaluation (19) .Furthermore, both roughness and breathiness are present in the most used auditory perceptual evaluation scales GRBAS e CAPE-V.
In summary, the analogical visual scale needs specific limits that allow the professional to correlate quantitative values and qualitative concepts since this correlation is essential for an adequate interpretation of the auditive perceptual evaluation results.This work approached the presence/absence of roughness and breathiness since they are the most recurrent vocal parameters among dysphonic individuals (18) , suggesting the investigation of the additional specific vocal parameters in future studies.
Thus, this study aimed to determine the cutoff values of different degrees of rough and breathy voices in a visual analogue scale, from a numerical scale.

METHODS
This study has been approved by the Ethics Research Committee under the opinion letter n. 872.185.Vocal samples were obtained from database of a Voice Laboratory, and consist of voices of patients examined in this outpatient clinic.
The recording of sustained vowel /a/ of 150 individuals, both genders, 18 years or older were selected for this study.The sustained vowel /a/ sample was chosen because it is one of the vowels used by several acoustic programs, allowing for future comparison with voice acoustic analysis, which currently account for approximately 60% of the publications about voice (20) .
The voice recording in the database were selected by a speech therapist ranging from neutral voices to voices with severe roughness and/or breathiness.After the selection, the voices were kept or excluded from this study according to the evaluation by four judges and according to the sample inclusion criteria.
The inclusion criteria for the selection of voice samples were: being 18 years or older, in addition to vocal quality predominantly rough or breathy and maximum difference of 10 millimeters (mm) in the opinion of, at least, three judges.
The database recordings were obtained with a headset, brand AKG/model C444PP, positioned laterally at 60º, and at 5 centimeters of the lip commissure.The audio signal was recorded by the program Sound Forge 10.0 with a sampling rate of 44.100 Hz, 16 bit, mono channel, in a computer with soundboard model Audigy II (brand Creative Sound Blaster).
For the standardization of the voice samples, the emission of the sustained vowel /a/ was edited, with Sound Forge 10.0, eliminating the first second of the emission and selecting the following three seconds; in case of an abrupt irregularity, a more stable segment of the recording was selected.
The auditive perceptual evaluation assessed two parameters, roughness R (irregularity in the vocal folds vibration) and breathiness S (audible air leak in the voice), evaluated using two protocols: the analogical visual scale varying from 0 (zero) to 100 mm, with 0 mm representing the absence of the evaluated parameter and 100 mm, its maximum intensity; and the 4 points numerical scale, where 0 (zero) representing the absence of the evaluated parameter, 1 representing mild degree, 2 moderate degree and 3 the presence of severe degree.
The evaluation was performed by four voice specialist speech therapists, with at least five years of experience with vocal evaluation.The evaluation was organized in two step, in the first step the speech therapists were guided to evaluate the voices using the analogical visual scale and, in the second, using the 4 points numeric scale.The interval among evaluations was 30 days.The speech therapists were previously trained and oriented to evaluate the absence or grade the presence of roughness and/or breathiness in both scales.Voices were randomly presented and the speech therapists made individual evaluations in a acoustically treated environment, using headphones of the brand Behringer, model HPX2000, in a computer with soundboard model Audigy II (brand Creative Sound Blaster).The method was chosen to avoid bias such as noise in the evaluation environment and variations regarding the soundboards and headphones.
The Interclass Correlation Coefficient (ICC) was used for the statistical analysis of the inter and intra judge concordance, which consider values lower than 0.4 as poor concordance, values between 0.4 and 0.75 as satisfactory concordance and above 0.75 as excellent concordance.We highlight that 10% of the sampling was randomly repeated for the intra judge analysis in the application of both scales.
We used the ROC Curve to set the cutoff values in the roughness and breathiness graduation, which is based in values of sensibility, specificity and efficacy.The ROC curve was calculated using the mean of analogical visual scale evaluations with the mode of numeric scale evaluations.The maximum efficacy rule was used to estimate the cutoff values, considering the highest values of sensibility and specificity.The efficacy correspond to the area under the ROC curve, being that the cutoff values closer to 1 were more precise.

RESULTS
Of the 150 voices evaluated, 21 were excluded for noncompliance with the sample inclusion criteria, 3 voices excluded for lack of consensus on the predominant feature of vocal deviation, 10 excluded due to difference greater than 10 mm between two or more judges, 4 due to absence of a predominant vocal characteristic, 3 due to tension as the predominant parameter in voice quality and 1 due to instability as the predominant vocal parameter.Thus, this study used 129 voices.
The inter judge concordance analysis performed using ICC was excellent, both for analogical visual scale (p=0.85) and numeric scale (p=0.77).The intra judge analysis was also considered excellent based on ICC results, with p-value between 0.87 and 0.93 in the analogical visual scale and between 0.83 and 0.88 in the numerical scale.
Cutoff values as well as sensibility, specificity and efficacy are described in the Table 1 for roughness, and in Table 2 for breathiness, reminding that the cutoff value is estimated by the higher efficacy value.Table 3 records the graduation interval of the evaluated parameters.

DISCUSSION
Studies show that the analogical visual scale is most sensible to small differences (10,11,13) and has higher agreement among judges when compared to the numerical scale (13) .On the other hand, it is difficult to qualitatively represent its results, which most of the time are linked by regions that represent the several degrees of vocal deviation (14) (absent, mild, moderate and severe).It is observed, however, the need of a correlation between the analogical visual scale and the numerical scale for a best interpretation of the auditive perceptual evaluations, seen that the analogical visual scale allows the clinician to correlate a quantitative data, with 100 points variation, in distribution intervals that qualitatively represent the several degree of the evaluated parameters.
In this context, some studies were developed focusing the vocal deviation overall degree (10,16,17) , reaching similar results regarding the normal variability of the cutoff value for the analogical visual scale from the numerical scale.It shows that, for the overall degree, the auditive perceptual evaluation with these techniques is a robust method.This work, however, focused to determine the cutoff values of the degree of presence of the roughness and breathiness specific parameters, which were selected because they had the higher consensus among judges (21) .Simberg et al. (16) , Yamasaki et al. (10) and Vieira et al. (17) established a cutoff value in the analogical visual scale using a 4 points numerical scale considering for the overall voice deviation the value 0 (zero) as normal variability of the vocal deviation.Yamasaki et al. (10) and Vieira et al. (17) found cutoff values also for the remaining degrees, with degree 1 representing a mild voice deviation, 2 moderate deviation and 3 severe deviation.Simberg found 34.5 mm as cutoff value for the normal variability of the voice deviation (16) , while Yamasaki et al. (10) and Viera (17) found 35.5 mm.The logic of this study to classify the voice quality is different from the other, as it didn't analyze the vocal overall G degree but the absence/presence of the specific parameters evaluated, which impairs the comparison of results with the studies found in the literature.In this study, 0 (zero) represents the absence of the evaluated parameter, 1 represents the presence of mild degree, 2 the presence of moderate degree and 3 represents the presence of severe degree.The methodology was adopted due to the characteristic of specific parameters, which is different according to gender and vocal frequency.It is known that male voices or deeper voices, independent of gender, have a higher chance of roughness (22) , on the other hand female voices have a higher chance of breathiness due to the glottal proportion and laryngeal format (23) .Thus, to estimate the cutoff values from the different degrees in the analogical visual scale, independent of gender and voice frequency, the neutrality deviation was considered instead of normal variability.
In this context, we found the point 0 (zero) of the numeric scale varying from 0 to 8.5 mm in the analogical visual scale, both for roughness and breathiness.This result is expected and reveals a limitation of the 4 points scale, since the judge, when required to opt between point 0, absence of evaluated parameter, and 1, mild presence, may opt for 0 if the presence of the parameter is irrelevant.Besides, studies consider variations of up to 10 mm as irrelevant in the 100 mm variations analogical visual scale.
Regarding the degree 1 of the numerical scale it is observed that the variation range extends to about 30 mm for both parameters evaluated (Table 3) close to the cutoff parameter considered in the overall grade for the normal variability (10,16) .This result may be indicative of normal variability of roughness and breathiness and should be better investigated in future studies.
Regarding the degree 3, mainly regarding the breathiness (52.5 mm), there is a cutoff lower than in the overall level of voice quality in the studies reported in the literature (10,16,17) .However, evaluate the roughness and breathiness not necessarily has the same impact as evaluating the overall degree of the voice quality and the judges may be more critical in the isolated evaluation of these specific parameters, which may justify the cutoff value found for the severe degree in the roughness and breathiness evaluation.In addition, the evaluation of absence or the degrees of presence of these parameters is different from evaluating the normal variability and the dysphonic degrees.Another factor that should be considered is the type of voice sample used for the evaluation.This study used the sustained vowel aiming to associate it in the future with acoustic analysis and works found in the literature using the chain speech samples (10,17) .It is known that the type of sample, speech or sustained vowel contribute to the variability of the auditive perceptual evaluation (5,7) .In sustained vowels there is a subglottic and the supraglottic condition relatively constant, whereas in continuous speech are observed temporal and spectral variations caused by start and word end, breaks, deaf phonemes, phonetic context, prosodic fluctuations in fundamental frequency and intensity, speed speech, among others (5) .
The concordance inter and intra judge was rated as excellent in both scales, that is, the p value was greater than 0.75.However, it is observed that the p value was added to the visual analog scale for both the inter agreement as to intra judge, indicating better concordance rates in EAV.This result is consistent with that found in the work of Yamasaki (10) and Kreiman (13) for voice auditory perceptual evaluation.In other areas of perceptual evaluation, such as self assessment of pain, it is observed that the visual analog scale of 100 mm is the most widely used, showing good reliability and practicality in the application (24,25) .In addition, some studies as Ferraz (26) , show that between three types of scales -analog visual, verbal and numeric scale -the visual analog scale showed the highest rates of agreement when performed the correlation between the application and reapplication of scale in pain self-assessment (26) .
As to the comparison between the auditory perceptual evaluation of roughness and breathiness (Tables 1 and 2), we note that the degree of intensity of the presence of these parameters were similar.It is suggested for future work, the establishment of the cutoff value for normal variability of these parameters for sex, to measure the differences of the normal range that we know exist between them for roughness and breathy.

CONCLUSION
It was possible to establish the cutoff values for the different degrees of roughness and breathiness visual analog scale from the numerical scale, concluding that the level 0 (zero) the numerical scale, which is the absence of the parameter corresponds to a small range of scores on the visual analog scale, while the level 3 of the numerical scale corresponds to an extensive range of analog visual scale.
The cutoff values found were: 8.5 mm for the presence of mild degree of both parameters evaluated, 28.5 mm for the presence of moderate in roughness and 33.5 mm in breathiness, 59.5 mm for the intense degree in roughness and 52.5 mm for the same degree in breathiness.
We also conclude from the cutoff values that the different degrees of presence of the evaluated parameters occur in close values, defining a similar classification for roughness and breathiness.In addition, this study allows the clinician to correlate quantitative values with qualitative concepts, making the results interpretation easier for the auditive perceptual evaluation using the analogical visual scale, both for screening and for roughness and breathiness auditive perceptual evaluations.

Table 1 .
Cutoff values for different degrees of roughness, in 4 points, with the respective values of sensibility, specificity and efficacy

Table 2 .
Cutoff values for different degrees of breathiness, in 4 points, with the respective values of sensibility, specificity and efficacy

Table 3 .
Distribution intervals of roughness and breathiness degree, in 4 points