Influence of speech sample on perceptual rating of hypernasality Influência da amostra de fala na classificação perceptiva da hipernasalidade

Accepted: September 17, 2015 Study carried out at Laboratório de Fisiologia do Hospital de Reabilitação de Anomalias Craniofaciais, Universidade de São Paulo – HRAC-USP, Bauru (SP), Brazil. 1Hospital de Reabilitação de Anomalias Craniofaciais, Universidade de São Paulo – USP – Bauru (SP), Brazil. Financial support: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação de Amparo à Pesquisa do Estado de São Paulo FAPESP (Processo 2013/14769-4). Conflict of interests: nothing to declare. ABSTRACT


INTRODUCTION
Individuals with cleft palate are at high risk of developing speech disorders such as hypernasality, nasal air emission, low intraoral pressure and compensatory articulations resulting from velopharyngeal dysfunction (VPD).Clinically, hypernasality is the most evident symptom of VPD in these individuals (1,2) .
The assessment of velopharyngeal function is a difficult process due to its complexity and dynamic nature.Therefore, many authors have proposed different ways to categorize the speech of individuals with cleft palate in an attempt to establish an universal standard allowing multicentric studies (2)(3)(4)(5) .Although instrumental evaluation, as videofluoroscopy, nasopharyngoscopy, nasometry and pressure-flow technique are essential for the diagnostic and management of the VPD, the identification of speech symptoms is mainly performed by the auditory perceptual assessment, that is considered "gold standard" in assessing individuals with cleft palate and the main indicator of the clinical significance of these symptoms (1,6,7) .However, due to its subjectivity, the evaluation may involves errors and variations, even when done by experienced professionals.The literature recommends that the perceptual assessment must be based on audio and/or video recorder in order to present results as an agreement between more than one evaluator regarding the judgment as to the presence and severity of the speech symptoms (2,5,6,(8)(9)(10) .Among the factors that can affect perceptual judgment of hypernasality, the type of speech sample remains one of the most relevant.Some authors believe that hypernasality is identified only during spontaneous conversation or is considered to be more severe in this type of sample (11,12) .With an increase of the spontaneous speech, due to additional requirements, such as muscle fatigue of the velopharyngeal structures, the hypernasality becomes more noticeable (12) .This means that one individual may have different degrees of hypernasality depending on the speech sample being analyzed, suggesting that the results of different raters is only comparable when using the same speech sample.This fact has led many researchers to propose the standardization of the speech characteristics that should be added in the perceptual assessment of individuals with cleft palate in order to minimize the influence of various factors on the assessment of hypernasality and improve the reliability of this method (4,9,13,14) .
This study aimed to investigate the influence of the speech sample -spontaneous conversation or repeated sentences -on the perceptual judgment of hypernasality in individuals with repaired cleft palate.Ultimately, the study aimed to investigate which speech sample makes the hypernasality judgment the most reliable with regard to intra-and inter-raters agreement.

Speech samples
This study was approved by the Human Research Ethics Committee of the Institution (nº 1.008.414).The study included 120 audio recorded speech samples (60 containing spontaneous conversation and 60 containing repeated sentences), from 60 patients with repaired cleft palate associated or not with cleft lip, of both genders, aged 6 to 52 years (mean 21 ± 10 years old), presenting or not VPD (i.e. two speech samples from each individual were analyzed).
Samples containing spontaneous conversation were obtained from personal answers to general questions adapted to the age of each individual, in order to obtain a speech sample long enough to allow the perceptual analysis of hypernasality.Samples with repeated sentences were composed of 11 standard sentences containing exclusively oral sounds.All samples were selected from digital audio recordings routinely performed in treated soundproof room and stored in the database of the Institution.Consent for data usage was obtained from all patients or their guardians, upon registration in the hospital.It was included only recordings with a good audio quality and with no noise that could compromise the analysis.However, samples containing other speech symptoms such as nasal air issuing audible compensatory articulations, nasal snoring and dysphonia were not excluded.

Procedures
The recordings were retrieved from the database, saved in MP3 and edited excluding the participation of the professional party's record of speech and standardizing the recording time in format at least 15 seconds and a maximum of 34 seconds.After editing, the speech samples were numbered and randomly copied onto two compact discs (CD), one containing samples of spontaneous conversation and the other containing the sentences of repeat samples.In order to analyze the intra evaluators concordance index, 30% of the samples were duplicated, randomized and included in CDs, care is taking to repeated samples were not included in the same CD in order to avoid being identified.

Perceptual analysis of hypernasality
Hypernasality was judged by three experienced speech therapists experienced in the perceptual assessment of individuals with cleft palate rated hypernasality in two stages.At first, the raters analyzed samples containing spontaneous conversation and after one month, the samples from the same patients containing sentences were analyzed.Although they are different speech samples, we settled this time interval between the two stages in order to avoid the patient's identification.On both analysis, the evaluators ranked hypernasality according to their own criteria (internal standard) using the following 4-point scale: 1 = absence of hypernasality (normal resonance), 2 = mild hypernasality, 3 = moderate hypernasality and 4 = severe hypernasality.As recommended, analysis were made individually using stereo headphones available for the study.Raters were allowed to listen to the recordings as many times as necessary.

Data analysis
Hypernasality was expressed as a score, according to the 4 point scale.Intra-and inter-rater agreements were established for the two types of speech samples: spontaneous conversation and repeated sentences using the Kappa coefficient considering the following strength of agreement: <0 = no agreement; 0-0 19 = poor agreement; 0.20-0.39= fair agreement; 0.40-0.59= moderate agreement; 0.60-0.79= substantial agreement; 0.80 to 1.00 = almost perfect agreement (15) .The intra-rater agreement coefficient was established based on the repeated analysis of 30% of the total samples (36 samples, with 18 containing spontaneous conversation and 18 containing repeated sentences).A comparison of the intra-e inter-rater agreement coefficients obtained in each step was analyzed using the Z test.Values of p<0.05 were accepted as statistically significant.

Intra-raters agreement
The intra-rater agreement of the degree of hypernasality obtained in the analysis of repeated sentences was significantly higher than that observed in the samples containing spontaneous conversation, as shown in Table 1.For rater 1, the Kappa coefficient significantly increased from 0.45 (moderate) to 1.00 (almost perfect), for spontaneous conversation and repeated sentences, respectively (p <0.001).For rater 2, the Kappa coefficient also increased from 0.60 to 0.74 for spontaneous conversation and repeated sentences, respectively, both interpreted as substantial, but with no significant difference (p = 0.590).As for the rater 3, there was a significant increase of the Kappa coefficient from 0.44 (moderate) to 0.92 (almost perfect) for spontaneous conversation and repeated sentences, respectively (p = 0.006).

Inter-rater agreement
The inter-rater agreement for both speech samples (Table 2) were 0.40 for spontaneous conversation and 0.38 for repeated sentences, indicating moderate and regular agreement, respectively.Data analysis showed no difference between the coefficients of the two stages (p = 0.970).
When analyzed separately, the agreement between each two raters the results showed an increase of the Kappa coefficient from 0.37 (spontaneous conversation) interpreted as regular to 0.43 (repeated sentences) interpreted as moderate between raters 1 and 2, with no significant difference (p = 0.628).For raters 1 and 3, there was a slight reduction of the Kappa coefficient from 0.48 in spontaneous conversation to 0.42 in repeated sentences, both interpreted as moderate and with no significant difference (p = 0.663).The comparison between raters 2 and 3 also showed a slight reduction of the Kappa coefficient, from 0.34 for spontaneous conversation and 0.31 to repeated sentences, both interpreted as regular, and this difference was not significant (p = 0.876).

DISCUSSION
In the present study, the comparison of intra-raters agreement coefficient between the two stages showed better agreement obtained in the repeated sentences than the spontaneous conversation for the three evaluators.Statistically significant difference were verified for two of them.One can speculate that the perceptual judgment of hypernasality in spontaneous conversation is harder to analyze due to the influence of several factors, such as context, rhythm of speech, pitch and compensatory articulation.According to the literature, in the presence of other speech symptoms, it is difficult for the rater to isolate hypernasality, often leading to the ratings as more severe (4,5,(10)(11)(12) .In addition, some authors believe that there isn't always a clear distinction between the passive errors, such as hypernasality, and the compensatory articulations (2) .Based on previous analysis of the speech samples of this study, it was found that approximately 50% ( The fact that the samples with repeated sentences presented lower proportion of coexisting speech symptoms may have favored and thus made reliable the judgment of hypernasality in this sample.Significant intra-raters agreement using repeated sentences and standardized words were shown in previous studies of the Institution, which ranged from substantial to almost perfect (16,17) , moderate to almost perfect (18,19) and regular to almost perfect (20) .Others studies present percentage of intra-rater agreement above 80% (21)(22)(23) .A similar result was found comparing nasalance scores with the results of perceptual speech assessment (spontaneous conversation and repeated sentences).The authors showed in the intra-rater analysis of experienced listeners, percentages of agreement ranging from 62.5% to 100% for spontaneous speech and 75% to 100% for repeated sentences (24) .
It is also known that the speech material and the elicitation technique may influence the speech intelligibility score obtained from the perceptual assessment of speech and significant differences may exist between the production of a word obtained from the repetition of sentences or from the spontaneous conversation (14) .It can be speculated, then, that the elicitation of the speech sample using repetition has facilitated the identification of the hypernasality.In the case of repeated sentences, the individual that is being evaluated has a tendency to reproduce the speech similarly to the evaluator, thus performing a better control of the rhythm of speech and articulation in order to produce the correct sounds, which does not occur in the spontaneous conversation (8) .
Although some authors (11) advocate that spontaneous conversation is an important tool for perceptual speech assessment since it reflects the individual's daily life, the use of sentences repetition facilitates the perceptual analysis of speech once it consists in a kind of speech sample more accurate.By proposing universal parameters for documentation of speech in individuals with cleft palate, experts recommend the use of repeated sentences and single words for the purpose of perceptual judgment of hypernasality as they are comparable even between different languages with similar phonetic context (4) .These same authors also suggest that spontaneous speech is used for rating other characteristics than the degree of hypernasality, for example, voice disorders and acceptability and speech intelligibility.
This study also showed no significant difference between the repeated sentences and spontaneous conversation.It suggests that although the repeated sentences samples somehow favor the consistency of the judgments of the same rater, this effect is not enough to increase the agreement between the different raters.These results confirm what is already well established in the literature, i.e., that achieving a high level of agreement between different raters in the hypernasality judgment, using their own internal standards, is difficult due to its perceptual nature, characterized as a sensation and considered the most difficult to obtain high reliability (10,25) .This is because the internal standards differ between raters.Researches report that the judgments of speech symptoms made by different raters are not comparable and that experience in the assessment of individuals with cleft palate does not guarantee a high level of concordance (25) .Inter-rater agreement coefficients similar to those found in this study have been verified by authors for both types of speech sample, which ranged from moderate to substantial (13,26) , moderate (3,9,17,24) , regular to moderate (18) and regular (20) .
It is noteworthy that no other study in the literature, to date, compared the ratings of hypernasality degree in different types of speech sample for the same individual.These findings are important to show that regardless of speech samples produced by the same individual (spontaneous conversation or repeated sentences), the inter-rater agreement coefficients remain fair, meaning that the type of speech sample does not improve the reliability of the judgment between different raters.This result may be explained by the type of scale used to classify hypernasality.As in most studies in the literature, the present study used an ordinal scale, which has been the most widely used both in research and in clinical practice (27,28) .However, due to the psychophysical nature of nasality, high agreement among different raters have been difficult to achieve using this method (29) .This is because the scale divides the different categories of speech symptom without to quantify the magnitude of the difference between each category and listeners tend to subdivide, especially the lower end of the scale into smaller intervals (30) .Thus, it is possible that this type of scale is not an effective method for hypernasality ratings, even for experienced evaluators.
Finally, the results of this study reinforce the need to adopt the constant practice of listener's auditory training in research centers for individuals with cleft palate in order to standardize the assessment criteria and calibrate professionals in an attempt to obtain reliable and comparable results with regard to the perceptual assessment of speech symptoms.

CONCLUSION
Sentences repetition improved the intra-rater reliability of perceptual judgment of hypernasality, as the agreement in this speech samples analysis was higher.However, the speech sample had no influence on reliability among different raters.

Table 2 .
Statistical comparison between the raters concordance indexes in perceptual analysis of hypernasality of both speech samples (spontaneous conversation and repeated sentences): percentage of concordance (%), kappa coefficient and its interpretation

Table 1 .
Statistical comparison between the intra-rater concordance indexes in perceptual analysis of hypernasality of both speech samples (spontaneous conversation and repeated sentences): percentage of concordance (%), kappa coefficient and its interpretation Caption: *Spontaneous conversation vs Sentence repetition -Z test