Severity of voice disorders : integration of perceptual and acoustic data in dysphonic patients

Purpose: To analyze the correlation between acoustic measures and intensity of vocal deviation, and its discriminatory power in detecting the presence of vocal change, classifying the severity of the deviation and differencing the type of predominant voice. Methods: The sample comprised 186 patients with dysphonia. The vocal deviation from the vowel /ε/ was evaluated by consensus between three speech language pathologists, using a Visual Analog Scale. We extracted the mean and standard deviation (SD) of the fundamental frequency (F0), jitter, shimmer, and Glottal-to-Noise Excitation Ratio (GNE). Results: Mean F0 was negatively correlated with the degree of roughness and positively with the degree of tension. The F0 SD was positively correlated with the overall, roughness, tension, and instability grades. The jitter and shimmer were positively correlated with all perceptual parameters. Only the GNE distinguished between healthy and dysphonic voices and rated the degree of breathiness. The shimmer separated rough voices from not rough voices. The mean F0 was helpful to determine the degree of phonatory tension and to separate rough voices from breathy and strained voices. Conclusion: There is a correlation between the acoustic and auditory perceptual measures. The shimmer, the GNE, and F0 SD can be used to detect roughness, breathing, and strain, respectively. The GNE and mean F0 are useful to classify the degree of breathiness and strain, respectively. The mean F0 distinguished between rough, breathy, and strained voices, with rough voices more severe compared to the other two. DOI: 10.1590/2317-1782/20142013033 383 Intensidade do desvio vocal em disfônicos CoDAS 2014;26(5):382-8 INTRODUÇÃO A voz é multidimensional e sua produção está relacionada aos aspectos anatomofisiológicos, emocionais, orgânicos, ambientais e comportamentais. Portanto, é necessário que a avaliação vocal também siga esse principio, mapeando os aspectos da produção da voz e correlacionando-os, permitindo, assim, uma visão global da disfonia. O objetivo da avaliação vocal é analisar a qualidade vocal, identificando se a voz é saudável ou não, diagnosticar a presença de um distúrbio de voz, monitorar a progressão de uma doença ou de uma função, avaliar o prognóstico e identificar se o indivíduo avaliado possui riscos para o desenvolvimento desse distúrbio. De modo geral, os estudos na área de avaliação e diagnóstico da voz procuram responder a três questões clínicas essenciais: qual a capacidade da medida para determinar a presença/ausência de um distúrbio de voz (diagnóstico)? Qual a evidência de que o teste utilizado consegue determinar a natureza (etiologia) de um distúrbio de voz? E qual a capacidade que uma medida possui para determinar a extensão (intensidade) de um distúrbio de voz? Em uma pesquisa realizada com fonoaudiólogos experientes nos Estados Unidos, 100% de 53 entrevistados relataram utilizar medidas perceptivo-auditivas durante a avaliação vocal, seguida por observação da postura e movimentação corporal e pela investigação da dinâmica vocal. Esses métodos de avaliação subjetivos eram significativamente mais utilizados do que a avaliação objetiva por meio das medidas acústicas. No entanto, em uma revisão sistemática, observou-se que a maioria (60%) dos estudos sobre avaliação de pacientes com distúrbios da voz utilizou medidas acústicas e estava focada na identificação da presença ou ausência desse distúrbio (78%). Poucos estudos (18%) investigaram a habilidade da medida em mensurar a intensidade do distúrbio vocal. Além disso, a maior parte dos estudos utilizou a imagem laríngea como padrão de referência para definir a presença de um distúrbio de voz. Diante disso, pode-se refletir que, primeiramente, uma das maiores aplicações de uma medida de avaliação é julgar a efetividade de um tratamento oferecido, que pode incluir desde a ausência de uma doença diagnosticada previamente até a redução da intensidade do desvio vocal. Em segundo lugar, dada a variedade de fatores etiológicos e de manifestações de um distúrbio de voz, nem sempre a imagem laríngea pode ser utilizada como referência para comprovar a ausência/presença de um distúrbio vocal, seja no diagnóstico inicial ou na avaliação pré e pós-intervenção, considerando-se também que um distúrbio de voz pode ser caracterizado por diferentes ajustes do trato vocal, não visualizados na laringoscopia. Sabe-se que a avaliação perceptivo-auditiva da qualidade vocal apresenta um considerável grau de dificuldade, visto que o julgamento desses parâmetros depende de vários fatores subjetivos, como as referências internas de cada avaliador, escala utilizada e sua respectiva sensibilidade e especificidade, atenção e fadiga do ouvinte, entre outros. No entanto, esse é o principal instrumento usado pelo fonoaudiólogo na avaliação da qualidade vocal. Dessa forma, considerando a necessidade de desenvolvimento de mais estudos que investiguem a capacidade das medidas acústicas em determinar a intensidade do desvio vocal, e devido ao número reduzido de pesquisas que tomam como padrão de referência os dados da análise perceptivo-auditiva para identificar a presença ou ausência de um distúrbio de voz, o objetivo deste estudo foi analisar a correlação entre medidas acústicas e a intensidade do desvio vocal, bem como o poder discriminatório dessas medidas na detecção da presença de alteração vocal, na classificação da intensidade do desvio e na diferenciação do tipo de voz predominante.


INTRODUCTION
Voice is multidimensional (1) and its production is related to anatomic, physiological, emotional, organic, environmental, and behavioral features.Voice evaluation, therefore, must be based on these principles, with mapping of voice production and correlation with such features for one to take a truly comprehensive view of dysphonia.
The purpose of voice evaluation is to analyze voice quality, that is, whether the voice is healthy or not; to diagnose voice disorders, to monitor any disease or function progression, to evaluate prognosis, and to identify possible risks to develop a disorder (2) .
Overall, studies on voice evaluation and diagnosis have attempted to answer three essential clinical questions (2) : what is the measure to determine presence/absence of any voice disorder (diagnosis)?Is there evidence that the test used in evaluation can determine the nature (etiology) of a voice disorder?What is the measure capacity to determine the severity of a voice disorder?
In a study (3) conducted with experienced speech language pathologists in the United States, all the 53 interviewees reported using auditoryperceptual measurements in voice evaluation followed by analysis of body posture and movements, and by dynamic voice evaluation.These subjective methods were substantially more used than objective evaluations by acoustic measurements only.
However, in a systematic review, most studies (60%) on voice evaluation in patients with voice disorders used acoustic measurements and focused on presence/absence of a disorder (78%).Few articles (18%) have investigated the ability of a measurement to measure the severity of voice disorder (2) .Also, most studies have used an image of larynx as reference to define the presence of a voice disorder.
Therefore, one may infer that, first, one of the main applications of an evaluation measure is to judge the effectiveness of a treatment made available, which may include absence of a disease that has been previously diagnosed and reduction of severity level.Second, facing the variety of etiologic factors and manifestations of a voice disorder, the larynx image may not always be used as reference for the presence/absence of voice disorders, eitherin initial diagnosis or pre-and post-intervention evaluations, once voice disorders are also characterized by different adjustments of the vocal tract that cannot be seen at laryngoscopy.
Auditory-perceptual evaluation is known to present a considerable difficulty level, once the judgment of parameters depends on many subjective factors, such as personal references of the evaluator, scale used and respective sensibility and specificity, attention, and fatigue of the listener (1,4,5) .However, this is the most widely used tool in voice quality assessments.
Considering the need for further studies investigating the capacity of acoustic measurements to determine the severity of voice disorder, and the lack of studies taking auditory-perceptual analysis as reference for the presence or absence of a voice disorder (2,(4)(5)(6) , the aim of this study was to assess the correlation between acoustic measurements and vocal disorder, as well as the discriminatory power of measurements in the detection of voice changes, severity of disorder, and differentiation of predominant voice type.

Study design
This is a quantitative, explanatory, cross-sectional field study approved by the research ethics committee of the Center for Health Sciences of Universidade Federal da Paraíba (UFPB), protocol no.52492/12.All participants signed the informed consent form to authorize the use of their data.

Sample
The sample was composed of 186 patients with dysphonia of both genders, aging 19-60 years (116 women and 70 men), assisted in the Voice Laboratory of Department of Speech and Hearing Therapy of UFPB from August 2012 and March 2013.
Patients more than 18 years and less than 65 years of age, with voice complaints and who had been through evaluation of larynx and had a report with diagnosis, were recruited to participate in the sample.Patients with cognitive or neurologic disorders that could impair recordings were excluded.The study group had patients with normal larynx, benign lesions of the vocal folds (nodules, cysts, polyps), primary muscle tension dysphonia, and unilateral vocal fold paralysis.

Material
An HP laptop was used to record voices, along with a Logitech headset microphone and the software FonoView 4.6h (CTS Informática).Sampling rate was 44.100 Hz.

Procedures
Data were collected in a silent room, with environmental noises below 50 dBNPS, measured with a digital sound pressure level indicator.The microphone was placed at a distance of 10 cm from patients' lips.
Audio recordings were made upon patients' first evaluation, before voice therapy, in 5-minute-long sessions where they were asked to sustain the vowel sound /ε/ for as long as possible, at maximum phonation time.
Afterwards, the voice samples were edited using the software Sound Forge 10.0 for initial and final seconds of sound emissions to be deleted, thereby preserving a minimum of 3 seconds of each recording.Normalization was made by the feature "normalize" in the software Sound Forge, peak level mode, aiming at standardization of audios at -6 and 6 dB.
Auditory-perceptual analysis of voice recordings was done with a Visual Analog Scale (VAS) measuring 0-100 mm to assess overall deviation level (OL) of vocal deviation, hoarseness level (HL), breathiness level (AL), strain level (SL), and instability level (IL): closest to 0, small deviation changes; closest to 100, large deviations.This assessment was consensual and made by three voice specialists.
Voice analyses were carried out in a silent room.Evaluators were first oriented to consider voices healthy when they were socially acceptable, produced naturally, without effort or noises, in stable condition at sound emission.They were also instructed to make a correlation between hoarseness and presence of vibration irregularity, brethiness and audible air escape, muscle tension and perception of voice strain, instability and presence of voice quality, frequency, and/or intensity fluctuation.They were previously trained with anchor stimuli of adequate sound production and vocalizations with different levels of deviation, as well as predominantly hoarse, breathy, strained, and unstable voices.
Each vocalization was presented three times in a sound box at a comfortable intensity for evaluators.Then, they identified presence or absence of voice deviations, predominant type of deviated voice (hoarse, breathy, strained, or unstable), and deviation level.
In the end of auditory-perceptual analysis, 10% of voice samples were randomly repeated to assess consensual evaluation by Cohen's Kappa coefficient.Kappa value was 0.80, indicating agreement between evaluators.Intra-evaluator Kappa value was 0.79, which also indicated agreement.
Acoustic measurements were made in the software VoxMetria, 4.7h (CTS Informática) voice quality measurement mode.Mean and SD at fundamental frequency (F 0 ), jitter, shimmer, and Glottal-to-Noise Excitation Ratio (GNE) of the sustained vowel were used in the evaluation.
Reference values in the software for F 0 , jitter, shimmer, and GNE SD were 0.2 Hz, 0.6, 6.5, and 0.5%, respectively.As to F 0 , jitter, and shimmer SD, values above the aforementioned are considered an alteration.Conversely, for GNE, the values below 0.5 are considered an alteration.

Data analysis
Descriptive statistical analysis was carried out for all variables analyzed, as well as correlation inference analysis with Spearman's test to identify the severity of voice deviation (deviation, mild, moderate, and severe) and acoustic measurements.
Correlation coefficients are used to assess and quantify the linear relationship between two variables, that is, if variables change in conjunction and to which level.Correlation coefficient varies from -1 to 1, where negative values indicate that variables change in inverse proportion whereas positive ones indicate change in the same proportion.
Values of 0.1-0.3 were considered to represent weak correlation in this study; values between 0.4 and 0.6 were considered moderate; values above 0.7 can be considered as strong correlation (8) .
Analysis of variance test was used to compare acoustic measurements according to the level of vocal deviation and predominant type of voice by post hoc analysis, using Scheffé test.
To categorize acoustic measures according to its discriminatory power compared to the presence or absence of vocal deviation, measures with mean value between adequate (level 1) and deviated (levels 2, 3 and 4).
Vocal deviation severity was classified according to acoustic measures with mean values presenting differences between levels 1 and 2, levels 2 and 3, and levels 3 and 4.
All analyses were made using the software Statistical Package for the Social Sciences (SPSS) 2.0, with significance level set at 0.05.
Comparison between groups showed that air escape parameter differed from mean values for jitter (p=0.004) and GNE (p<0.001) (Table 4).
Comparison of instability levels showed differences between mean values for F 0 SD (p<0.001),jitter (p<0.001),shimmer (p<0.001), and GNE (p=0.006)(Table 4).However, in post hoc analysis, none of the measurements met the criteria established as reliable for diagnosis and monitoring of voice changes.
Comparing groups according to predominant voice type among deviated voices, mean F 0 (p<0.001) and GNE (p=0.039)values were found to be different (Table 5).In post hoc analysis, mean F 0 value was able to distinguish hoarse and breathy voices (p<0.001), with higher values for breathy voices.Differences were also found between hoarse and strained voices (p=0.002), with higher values for strained voices.

DISCUSSION
Association of auditory-perceptual and acoustic evaluations is of utmost importance to identify voice quality, voice deviation levels, and treatment or surgical results (9) .
Acoustic measurements are able to determine changes in voice quality, as a relation between laryngeal abnormalities and severity of voice disorders is expected (6) .
In this context, this study was aimed at analyzing the correlation between acoustic measurements and severity of voice disorders, and at identifying the discriminatory power of such measures to detect presence of vocal deviations, to classify the severity of disorder, and to distinguish predominant voice types.
There was a weak correlation between mean F 0 and auditory-perceptual data, which show patients with higher HLs have low voices, whereas those with higher phonatory tension levels also have higher voices.
The negative correlation between mean F 0 and hoarseness may be justified by the direct relation between presence of hoarse and vocal fold vibration irregularity, which may result from small edemas, vasodilation, fatigue, voice misuse, and abuse (10) .Therefore, nodules, polyps, and edemas, which cause increase in mass of the vocal folds and vibratory irregularity, may decrease fundamental frequency, thus making voice pitch lower (11,12) .In the correlation found between mean F 0 and phonatory tension, it is important to note that frequency is determined by vocal fold tension, among other factors, which is controlled by laryngeal muscles, namely cricothyroid muscle (13) .Patients presenting phonatory tension, therefore, usually have contraction of extrinsic and intrinsic muscles, with more longitudinal tension in the vocal folds, subglottic pressure, and vocal tract tightening, causing an increase in glottic cycles per second and, consequently, increase in fundamental frequency (14) .
Although in literature (15) OL and HL are the parameters mostly related to F 0 , in our study phonatory tension and hoarseness were more related to F 0 .Similar findings have been reported in studies (16,17) conducted with children, being phonatory tension, OL, and HL mostly related to F 0 .
In our study, F 0 SD presented slight positive relation to OL of voice deviation and phonatory tension level, as well as a weak positive correlation to hoarseness and instability levels.Voices presenting the more severe deviations and higher hoarseness, strain, and instability levels also presented higher F 0 SD values.
Considering physiology, F 0 SD is directly related to the vocal fold neuromuscular conditions and vibration regularity.Acoustic and perceptual features are directly related to the timekeeping of sound emissions.Therefore, as vocal fold histological changes interfere with glottic vibration patterns, especially mucosal wave, causing a deviation in vocal production, correlation between perceptual features and F 0 SD can be justified (11,15) .
Jitter and shimmer values were positively related to all auditory-perceptual features, with higher values in voices for severe deviations.Studies have reported (18,19) that jitter reflects OL of voice deviation, a sensible measure to detect voice quality deviation, which justifies higher values in most deviated voices considering all perceptual features.
Studies (6,(20)(21)(22) combining the number of acoustic features and data from laryngeal examinations suggested that jitter and shimmer can be strong predictors of voice disorders, being able to detect mild changes that would normally go unnoticed in perceptual analysis.
Studies (18,19) have compared patients before and after voice therapy and reported a moderate correlation between acoustic features (jitter, shimmer, and harmony-noise ratio) and perceptual analysis.The strongest correlation was found between OL and jitter and shimmer.
In our study, F 0 mean and SD values, jitter, shimmer, and GNE were different as to voice deviation levels.However, in post hoc analysis and considering criteria established to define a measurement as reliable to categorize healthy or deviated voices and deviation levels, GNE was the only measure able to distinguish healthy and deviated voices.No measure could classify voice deviation severity.
HL was shown different for mean values of all acoustic measurements.Only shimmer was reliable to distinguish hoarse voices.No measure was reliable to classify severity of hoarseness at voice production.
In general, jitter and shimmer are used to describe hoarseness found at perceptual evaluation and vibratory irregularity to the physiological extent, whereas noise measurements are used as indicators of air escape and inadequate glottal closure (6,18,(23)(24)(25)(26)(27) .A study reported auditory parameters for hoarseness to be related to shimmer (28) .
Considering air escape at voice production, although mean jitter and GNE values were different, post hoc analysis proved GNE reliable to classify the intensity of air escape and distinguish breathy voices from healthy voices, according to criteria established in our study.
GNE measures additional noises at sound production, regardless of noises caused by glottal mechanism.It indicates the origin of voice production, coming from vocal fold vibration or air flow in the vocal tract.It can show different values in different phonatory adjustments, different voice deviations (23) .GNE may be considered a more reliable measurement because, unlike jitter and shimmer, it does not require previous estimates of fundamental frequency, which is very difficult in cases of severe laryngeal and voice deviations (23)(24)(25) .
Literature (23) mentions to use of noise measurements in voice assessment and triage once they have good relation to perceptual analysis, as found in our study.Studies (23,26) combining different acoustic parameters to describe deviated voices have shown GNE is the most reliable measurement in independent analyses, for it allows one to distinguish normal and deviated voices.
GNE is directly related to hoarseness and air escape at voice production, two of the most reliable parameters for perceptual evaluation.Therefore, based on our findings, we can say that GNE is reliable to diagnose voice changes, to detect and classify air escape at voice production.
Phonatory tension parameter presented differences in mean values of all acoustic measurements.In post hoc analysis, and considering previously established criteria, mean F 0 was reliable to determine intensity of voice strain, whereas F 0 SD could distinguish normal voices from strained voices.
Increase in muscle tension can cause unbalance in the system and, as a consequence, difficulty in voice production control, which causes fundamental frequency to oscillate and increases F 0 SD values.
A study (29) using excised larynges investigated changes in subglottic pressure by comparing nonlinear dynamics with disturbance measurements.A significant increase in subglottic pressure caused irregularities in vibration, bifurcation, hoarseness, and inadequate voice effort.Thus, excessive phonatory tension with increase in subglottic pressure or vocal tract tightening can also result in vibration irregularity, reflected in F 0 SD values.
As to phonatory instability, F 0 mean and SD, jitter, shimmer, and GNE differed.However, in post hoc analysis, these measurements did not meet criteria of voice deviation presence and classification.
Finally, comparing acoustic measurements with predominant voice type in cases of vocal deviation, hoarse voices differed from breathy and strained voices in F 0 values, as hoarse voices had lower pitches compared to the others.This can be justified by the following: increase in intrinsic and extrinsic muscles tightening in strained voices, causing rigidity in the whole system, increasing pressure, and decreasing contact on vocal folds, resulting in a smaller portion of the organ vibrating in cases of underlying glottic closure in breathy voices, both causing voice to be higher.However, hoarseness is directly related to presence of lesion on the vocal fold free edge, causing voice to be lower (12) .
The biggest challenge in clinical practice and research is to understand which tool is the best to assess and correlate acoustic measurements to the perceptual and physiological extents (23,26) .Based on our findings, we can say that there is a correlation between acoustic and auditory-perceptual measurements in quantification of voice deviations.Considering reliability of acoustic parameters to determine presence or absence of voice changes, GNE was reliable to identify normal and altered voices and to distinguish breathy and non-breathy voices, whereas shimmer identified presence or absence of hoarseness, and F 0 SD was able to identify strained voices.
Regarding effectiveness of acoustic measurements to predict voice deviation severity, GNE was able to identify ALs, whereas mean F 0 was able to measure phonatory tension levels.
Overall, measurements that classify voices as healthy or altered, especially as to OL such as GNE, can be used in procedures for triage and diagnosis of voice disorders, while measurements able to identify the severity of disorder can be reliable for voice monitoring during voice therapy.

CONCLUSION
There is correlation between acoustic and auditory-perceptual measurements.GNE is reliable to assess adequate and deviated voices, and to identify and classify AL.Shimmer may be used to diagnose presence of voice hoarseness.Mean F 0 is reliable to classify the intensity of phonatory tension, whereas F 0 SD can be used to diagnose voice strain at vocalization.Mean F 0 was able to distinguish hoarse, breathy, and strained voices, and hoarseness was considered the most severe compared to the others.*LWL, DPC, and POC were in charge of the study design and development.LWL was responsible for data collection, result analysis, and paper final review.DPC was responsible for data collection and result analysis; POC was in charge of data tabulation, statistical analysis, and paper review.

Table 1 .
Distribution of voice parameters according to overall, hoarseness, air escape, strain, and instability levels

Table 2 .
Description of predominant voice types in patients with vocal deviation

Table 5 .
Comparison of acoustic measurements with predominant voice type

Table 4 .
Comparison of acoustic measurements with severity of voice deviation *Significant values (p≤0.05)-ANOVA Caption: NVVQ = normal variability of voice quality; SD = standard deviation; OL = overall level; F 0 SD = fundamental frequency standard deviation; GNE = Glottal-to-Noise Excitation Ratio; HL = hoarseness level; AL = air escape level; SL = strain level; IL = instability level