Learning factor influence on the perceptual-auditory analysis Influência do fator de aprendizagem na análise perceptivo-auditiva

Accepted: October 31, 2017 Study conducted at Universidade Federal de São Paulo – Unifesp with partnerships of Centro de Estudos da Voz – CEV and Universidade de Brasília – UnB as part of a master thesis São Paulo (SP), Brasil. 1 Universidade Federal de São Paulo – Unifesp São Paulo (SP), Brasil. 2 Centro de Estudos da Voz – CEV São Paulo (SP), Brasil. 3 Universidade de Brasília – UnB Brasília (DF), Brasil. Financial support: nothing to declare. Conflict of interests: nothing to declare. ABSTRACT


INTRODUCTION
The perceptual-auditory analysis is considered the gold standard for the vocal evaluations (1) , and it is used for diagnosis, measurement of treatment results and to evaluate immediate effects of vocal exercise (2)(3)(4) . However, it is considered to be a subjective assessment, which results in poor inter-rater reliability and inconsistency in the vocal evaluations (1,5) .
Listeners with similar auditory experiences, that is, with similar professional activities and backgrounds, present a better inter-rater reliability (6) . Thus, it is necessary to establish basic specifications and/or provide training before any vocal evaluation in order to reduce this common inter-rater variability. Furthermore, evaluators with auditory training and relevant clinical experience have better intra-rater reliability. Therefore, previous training is essential to guarantee better outcomes and more consistent answers on the perceptual-auditory analysis (1,7,8) .
It is known that learning occurs when a new information is associated with previously learned and relevant concepts. In other words, to facilitate the learning process, the information must have already been presented to the individual and he must have considered it relevant and useful.
Thus, it can be stated that, in order to guarantee the learning factor, it is necessary to first present the new concept and/or information. Subsequently, other related concepts and information must be presented and added to what was posteriorly showed; hence, the individuals will in fact retain and learn that new information (9) .
With this in mind, to guarantee a better perceptual-auditory analysis, regardless of the type of vocal stimulus, the more training, the better (9) . Thus, the more the individual is exposed to this activity, with different types and degrees of vocal deviations, the more he will be able to learn about it and from it. Therefore, he will be more prepared to perform a perceptual-auditory analysis, regardless of the target stimulus.
Even without a full understanding of the learning factor concept related to training, commonly, the perceptual-auditory analysis uses training as a strategy to overcome its lack of reliability (10) . Also, training is considered extremely important for professionals who will work with the human voice (11,12) . Thus, experienced professionals in this work field, that is, voice specialists with a specific training for this task and relevant clinical experience possibly will have a better reliability and greater skills in the analysis of different stimuli. They will be able to add to their experience, information and concepts to which they were presented previously, possibly during their study to become a voice specialist.
Therefore, the aim of the present study was to investigate the learning factor during a perceptual-auditory analysis for three different groups on an unusual task.

METHODS
This retrospective study performed a new analysis on the data of a previous research that was approved by the Committee for Ethics in Research under the protocol number 1.281.837 (13) ; all participants signed the informed consent form.
This new analysis focused on other issues not addressed in the previous study that aimed to evaluate the quality of synthesized voices (13) . The present study aimed to analyze the learning factor in the perceptual-auditory analysis of listeners with different levels of perceptive-auditory training and backgrounds.
The stimuli used for this research analysis were 18 human and 18 synthesized voices of the Brazilian vowel "ae", sustained for 1 second. The listening session counted also with 50% of random repetition to verify the intra-rater reliability.
The human voices were selected from a voice bank of a vocal clinic that has over 1000 controlled stimuli of patients. The voices' selection was performed by convenience of three voice specialist speech-language pathologists in order to represent different types (roughness, breathiness and strain) and degrees (mild, moderate and severe) of vocal deviation. Although very common at the vocal clinic, voices with combined characteristics, such as roughness and breathiness, or roughness and strain, were avoided due to the complexity of this type of stimuli. Finally, the human set of voices was composed by six voices for each vocal type, three for each gender, with three degrees of vocal deviation, totalizing 18 stimuli. All voices were of Brazilian adult patients.
The set of synthesized voices was selected from a voice bank with over 200 stimuli, previously produced by a physics-based synthesizer, the VoiceSim. The synthesizer contains a representation of the vocal tract in the form of concatenated tubes through which an acoustic wave propagates; it includes model of trachea, nasal cavity and paranasal sinuses. The vocal tract that was used represented the Brazilian vowel "ae"; the vocal tract excitation was generated by a vocal fold model that considers its movement as a superficial wave that propagates through the mucosa, induced by the air flow.
In order to produce voices with different vocal deviations, different acoustic parameters were manipulated, such as: jitter, signal-to-noise energy ratio, glottal area at the prephonatory position, stiffness coefficient and fundamental frequency (13) .
The same three voice specialists speech language pathologists who selected the human voices, also selected the synthesized voices. These produced voices had to be in accordance and paired with the vocal type and degree of deviation of the previously selected human voices. Therefore, a total of 18 synthesized stimuli, nine male and nine female, were selected.
A total of 269 listeners participated in this study; they were all adults more than 18 years old who were invited to participate. The listeners were divided into three groups according to their auditory experience based on their professional background and voice experience.
The Experienced Group (EG) was formed by 73 voice specialists speech language pathologists (5 men and 68 women), with an average of 11.5 years in the profession and 35.3 years old. The Non-Experienced Group (NEG) was composed by 84 non-voice specialists' speech language pathologists (3 men and 81 women), with an average of 6.2 years in the profession and 29.5 years old. The third group was the Naive Group (NG), with 112 listeners that were not speech-language pathologists and with an average age of 32.4 years old.
The perceptual-auditory task for all the 269 listeners was to classify 54 voices as being human or synthesized. This same voice bank and perceptual-auditory task was used in another study that focused on different analysis (13) . To better address the aim of the present study -analyze the learning factor during a perceptual-auditory analysis for three different groups on an unusual task -the groups were renamed.
In the interest of verifying the consistency of the listeners, a screening was performed. The listeners were considered consistent if at least 13 of the 18 repeated voices had the same answer, that is, if the voice 2 was the repetition of the voice 37, both should have the same response, despite of being right or wrong. Therefore, the listeners should have a percentage of equal answers, of consistency, of at least 72.2%.
The responses of 99 listeners were inconsistent and therefore, excluded from the final analysis. The EG, the group of voice specialists, thus, the group with greater perceptual-auditory experience, had less excluded listeners due to the lack of consistency (20.5%; p <0.05 -Student's t-Test for paired samples). The NEG, the group of non-voice specialists had 39.2% of excluded listeners and the NG, nonspeech language pathologists, had 45.5% of excluded listeners (EG vs NEG p = 0.011; EG vs NG p <0.001; NEG vs NG p = 0.382). Thus, the non-voice specialists speech language pathologists and the naive listeners were equally inconsistent and thus excluded from the final analysis.
The final analysis counted with 170 individuals: 58 from the EG (2 men and 56 women, with an average of 11.7 years in the profession and 34.9 years old); 51 from the NEG The analysis of the learning factor considered the responses of each group. For this analysis, the occurrence of intra-rater errors at the beginning of the task (the first 18 voices at the Beginning) was compared with the occurrence of errors at the end of the task (the last 18 voices at the end). An error was considered whenever a listener classified a human voice as being synthesized, and vice versa.
It had already been observed (13) that the individuals from the so-called non-experienced and naive groups presented more errors in classifying human and synthesized voices. However, no analysis was performed regarding how these individuals improve or not their evaluation, neither what elements and strategies the so-called experienced group might have used in order to present a better outcome in the identification of human and synthesized voices. Thus, this study analyzed the occurrence of error at the beginning and at the end of the perceptual-auditory analysis, in order to understand if there was a learning factor for each one of the groups, especially for the group of voice-specialists.

RESULTS
The learning factor was observed only for the EG; the voice specialists had less errors at the end of the perceptual-auditory analysis than in the beginning (Table 1).

DISCUSSION
The vocal assessment is a perceptual phenomenon (14) influenced by both the previous training and the listeners experience (1,7,8) . Strategies such as training and anchor stimuli are used in order to guarantee a higher reliability of vocal evaluations (10,15) .
Listeners with previous training showed advantages on the screening performed in this research, once they had less excluded listeners due to inconsistent answers. Therefore, it seems that the EG are able to maintain consistency probably due to internal standards (15,16) formed during their study to become a voice specialist. The NEG and the NG made more random decisions.
The learning factor analysis considered the error percentage of the first 18 stimuli presented on the task and compared them with the error percentage of the last 18 stimuli. The EG presented less error at the end of the task (Table 1); hence, listeners with some type of experience probably presented the learning factor. The learning factor was also observed for the Total group; perhaps, the positive outcome of the EG affected the total analysis, once the NEG and NG did not present this result ( Table 1).
The identification of human and synthesized voices is quite unusual and it is not part of the speech language pathologist clinic routine; despite of this, the voice specialists were able to complete the task with a better performance. They presented fewer mistakes in the identification of human and synthesized voices when compared to the other listeners (13) ; plus, they seem to have presented the learning factor, because they had fewer errors at the end of the listening session. In order to complete the unusual task that this research proposed, it is possible that the voice specialist used learning strategies related to the perceptual-auditory analysis -that were certainly learned during their study to become a voice specialist. With this in mind, it may be suggested that the voice specialists were self-regulated learners. The self-regulated learning depends of motivation, cognitive and metacognitive factors (17)(18)(19) . Moreover, the use of learning strategies demands effort; thus, they are only used when the information is considered valid by the individual (19) . Thus, it may be suggested that only the EG used learning strategies once they are the group that understands the importance and relevance of a perceptual-auditory analysis; therefore, they put more effort to perform the task.
In addition, it is known that learning depends on repetition, motivation and emotion (20)(21)(22) . One of the motivation strategies is the intention to learn (17) . The voice specialists had motivation; they wanted to study to become a voice specialist. Also, they had experience, which brings repetition that increases the intrinsic redundancy to build auditory standards; therefore, they have a more robust internal standard system to identify voices (15,16) .
The intrinsic redundancy involves the acoustic signal that reaches the brain including structures related to its neural pathways and conversion to neural code (23) . The neural connections will become stronger with more stimulation of the auditory pathway, and therefore, more redundancy. Such redundancy guarantees to the voice specialists more strategies to learn during the task and to search for auditory memories that will improve their performance, as observed in this research.
There are several studies about learning styles theories; they affirm that each person learns in a different way and that to guarantee a better performance each personal style must be considered. However, the studies that seek to prove such theory have failed methodologies or do not actually prove its existence (24) .
On the other hand, regardless of the learning style, it is known that in order to learn tasks related to, for example, geometry, the best training is with visuospatial activities (24) . Thus, in order to learn how to perform a perceptual-auditory analysis, the best training is with auditory activities, that is, listening to voices.
No type of training or task-specific instruction was given for this study task. The listeners should simply classify the voices as being produced by a human or a synthesizer, without further information and without any previous specific training.
It is known that students with high abilities have better results with less structured instructions when compared to student with low abilities; also, students with low abilities present better performance with more structured instructions (25) . It is worth mentioning that self-regulated learners are more independent to perform a task (17) . With this in mind, it seems that the voice specialists are more prepared and once again it may be mentioned that they are self-regulated learners.
As previously mentioned, this research did not offer any training or instruction regarding how to perform the task. It is possible that if it had done so, the groups of non-voice specialists and the naive listeners could also have learned during the task and also presented the learning factor. This hypothesis should be tested, once it may also provide data regarding the minimum amount of training time to consider a listener as able to perform a valid perceptual-auditory analysis; this training may even be included into the voice specialization courses. Anyhow, it is clear that the auditory training is essential, since it provides learning strategies to ensure a better perceptual-auditory analysis. Further, when performed by someone without previous training, such analysis is less reliable.
These research data reinforce the importance of performing a perceptive-auditory training for all voice specialists or professionals aiming to work with vocal analysis; previous training gives the evaluator the possibility to use learning strategies and greater cognitive flexibility to identify auditory challenging tasks. Such strategies make it easier for the evaluator to learn and to improve his performance during the task, even though if it is unusual, new and considered hard by most of the listeners.

CONCLUSION
The voice specialists' speech language pathologists, the experienced group, were the only group that presented the learning factor. Therefore, it seems that the professional experience positively influences the perceptual-auditory analysis, which reinforces the impact of its training to become a voice specialist. In addition, the voice specialists seem to be more prepared for the task and to benefit from the task itself; thus, they use learning strategies and have a better performance, even for an unusual task.