Impact of inter-judge agreement on perceptual judgment of nasality

Purpose: To investigate the effect of perceptual inter-judge agreement of hypernasality on velopharyngeal (VP) closure prediction. Methods: Two logistic regression models were developed aiming to verify the possibility of predicting the VP closure using the following characteristics: rating of VP closure (adequate, borderline, inadequate), determined by the pressure-flow technique, degree of hypernasality (absent, mild, moderate, severe), and the presence/absence of nasal air emission and nasal rustle determined perceptually by three experienced speech language pathologists. In the first model, 100 speech samples with a moderate agreement rate of hypernasality (kappa coefficient: 0.41) were used. In the second model, 43 speech samples with a perfect agreement among judges were included. The χ-test was used to compare the models (p≤0.05). Results: In the first model, 65 of the 100 samples were rated in the correct VP closure category, with 42 adequate and 23 inadequate. The borderline VP closure was not predicted. The second model rated 31 of the 43 samples in the correct category, with 21 adequate VP closure, 5 in the borderline VP closure, and 5 inadequate. There was no difference (p=0.526) between the two models. However, the second model showed a higher proportion of accuracy (7%) than the first one, and it has also predicted the borderline VP closure. Conclusion: These results showed the importance of high index of inter-judge agreement when using subjective parameters of speech evaluation, especially when compared to an instrumental evaluation. This suggests the need for strategies for training and calibration of judges in the perceptual judgment to improve the reliability of auditory-perceptual assessment.

The borderline VP closure was not predicted.The second model rated 31 of the 43 samples in the correct category, with 21 adequate VP closure, 5 in the borderline VP closure, and 5 inadequate.There was no difference (p=0.526) between the two models.However, the second model showed a higher proportion of accuracy (7%) than the first one, and it has also predicted the borderline VP closure.Conclusion: These results showed the importance of high index of inter-judge agreement when using subjective parameters of speech evaluation, especially when compared to an instrumental evaluation.This suggests the need for strategies for training and calibration of judges in the perceptual judgment to improve the reliability of auditory-perceptual assessment.

INTRODUCTION
The auditory-perceptual assessment of the speech is an important instrument for the velopharyngeal function, because the speech characteristics identified in this kind of assessment provide clues on the extent of the velopharyngeal failure (1)(2)(3) .Among the speech symptoms directly related to the velopharyngeal dysfunction (VPD), there is the hypernasality, defined as a perceptive phenomenon and, therefore, a subjective one, fact that interferes in the reliability of the auditory-perceptual of the speech and that has been targeted by researches for many years, according to the review recently conducted (4) .
To ensure greater reliability of results in researches concerning the speech of individuals with the VPD, the literature recommends the use of different evaluators for the perceptual judgment of speech symptoms (5,6) .However, a high level of agreement may be difficult to be achieved, because of several variables.
Considering the possibility that the level of agreement among different evaluators on the judgment of speech symptoms may influence the prediction of velopharyngeal closure (VPC), the proposal of this study was defined.
This study investigated the effect of the agreement level between the evaluators in the perceptual judgment of hypernasality for the prediction of the VPC.

METHODS
This study was approved by the research ethics committee of the Hospital for Rehabilitation of Craniofacial Anomalies, University of São Paulo (document No. 360/2010 and No. 254/2012 SVAPEPE-CEP), and all the participants signed the informed consent form.
The inter-evaluator agreement on the judgment of hypernasality was obtained by the kappa coefficient.
Two models of logistic regression were developed with the objective of verifying possibilities of predicting the VPC using the following speech characteristics: VPC (adequate, borderline, and inadequate closure) determined by the pressureflow technique, hypernasality degree (absent, mild, moderate, severe), and the presence or absence of nasal air emission and assessed nasal snoring, perceptually, by three speech language pathologists with 12 years of experience, on average, in the auditory-perceptual assessment of the speech in patients with cleft lip and palate.In the first model, 100 samples of speech with moderate agreement level to hypernasality (kappa coefficient: 0.41) were used.In the second model, 43 of the 100 samples of speech obtained full agreement among the evaluators.Later on, the proportion of correctness referring to each model was compared through the χ 2 -test at a significance level of 5%.

RESULTS
Table 1 shows the number of samples, provided according to the logistic model in relation to the real classification of VPC determined by the pressure-flow technique.
According to this model, 31 of the 43 samples were classified correctly.This model predicted borderline VPC.
Although the comparison between the two models did not show any difference (p=0.526), it was verified that the second model, consisting of the full inter-judge agreement as for the degree of hypernasality, showed correctness proportion 7% higher than the first model, besides having predicted the borderline VPC.

DISCUSSION
There is a consensus in the literature that the auditoryperceptual assessment of speech is the means through which the speech language pathologist may identify the changes in speech, classifying its severity and, thus, defining the conduct and assessing the treatments performed (7)(8)(9)(10)(11) .However, it is a subjective method and, therefore, subject to mistakes and influences of various factors.The main effect is related to the internal standards of each listener, that is, the individual According to this model, elaborated from the 100 samples that obtained moderate inter-judge agreement level in the perceptual judgment of the degree of hypernasality, 65 were predicted in the correct category.However, the borderline VPC was not predicted.
Table 2 shows many predictions according to the second logistic model in relation to the real VPC classification, using 43 samples that obtained perfect agreement level among the evaluators during the perceptual judgment of the hypernasality degree.The impact of inter-judge agreement CoDAS 2014;26(5):357-9 references of each evaluator that differ from one another.This led clinicians and researchers into searching strategies to enhance the perceptual assessment.The main change was the use of recordings of speech samples, which allowed the assessing of the speech by more than one listener.From there, several studies showed the importance of the presence of the analysis of speech symptoms by different evaluators and of obtaining agreement among them.
Specifically whit regard to hipernasality, a high level of agreement between evaluators is difficult to be obtained.This is because it is a perceptual phenomenon affected by many factors, for example, the listener's experience that determines the internal standards of each listener (12) .Although the evaluators consulted had 12 years of experience, on average, on speech assessment in the presence of cleft palate and, also, similar academic background as for the speech assessment process, having been trained by the same reference center, though differences related to the internal standards of each evaluator may be expected.It is speculated that this is the reason for obtaining moderate agreement levels as to the degree of hypernasality verified in the first model, a result also obtained in other studies (13)(14)(15) .This, probably, led to the reduced correctness percentage shown in this model, making the reliability of the VPC prediction questionable.Thus, a second logistic model was developed, based on the 43 samples whose inter-judge agreement level in the judging of hypernasality was perfect.In this case, there was an increase in the percentage of correct predictions of VPC based on the perceptive speech characteristics.Besides, in this second model, the borderline VPC was predicted, which did not occur in the previous model.This result was due to the full agreement of the evaluators as for the hypernasality degree, showing, therefore, the effect of agreement levels among different evaluators when subjective parameters of speech are being assessed, especially those concerning listener's perception.

CONCLUSION
The high level of inter-judge agreement as for the hypernasality degree positively affected the prediction of VPC.This means that, besides the perceptual judgment made by more than one listener, in the auditory-perceptual assessment of the speech characteristics, it is essential to use strategies that ensure high levels of agreement among them, to improve the reliability of results.
*RHS was responsible for the study's original idea, data collection, data analysis and the writing of the article; ACASFO collaborated with data collection and tabulation; APF collaborated in the collection and analysis of the data and the writing of the article; MHS participated in the statistical analysis of the data and the writing of the article; IEKT participated on the writing of the article; RPY was responsible for the project and outlining of the study and overall orientation on the stages of execution and elaboration of the manuscript.

Table 1 .
Distribution of the 100 samples of speech predicting the velopharyngeal closure, from the hypernasality degree and the presence or absence of nasal air emission and nasal snoring

Table 2 .
Distribution of the 43 samples of speech predicting the velopharyngeal closure, in relation to the hypernasality, nasal air emission, *Model's correctnessCaption: VPC = velopharyngeal closure