Emphatic accent used by individuals with and without voice and speech training



Maria Cristina de Menezes BorregoI; Mara BehlauII

IGraduate Specialization in Voice, Centro de Estudos da Voz – CEV – São Paulo (SP), Brazil
IIGraduate Program in Human Communication Disorders, Universidade Federal de São Paulo – UNIFESP – São Paulo (SP), Brazil; Centro de Estudos da Voz – CEV – São Paulo (SP), Brazil





PURPOSE: To investigate how individuals with and without voice training use emphatic accent in two previously selected words during a reading.
METHODS: Seventy seven individuals with ages between 19 and 57 years were distributed into two groups: 51 students from a radio training course – TG (trained group); and 26 subjects with no voice and speech training – UnTG (untrained group). Individuals read a radio report twice, emphasizing two different words in each reading: "negotiates" and "reforms". The readings were recorded with an interval of two months between them, which corresponded to the beginning and end of the radio training course attended by the TG. Voice samples were submitted to: auditory-perceptual analysis of the occurrence, evaluation and use of emphasis; visual analysis of the spectrographic trace for delimitation of the pauses; acoustic analysis of the duration and fundamental frequency of the emphases. Results were submitted to statistical analysis.
RESULTS: The TG had higher grades than the UnTG regarding the quality of emphasis use, and there was no difference in its occurrence and use. The word "reforms" had higher occurrence of emphasis and was better evaluated than the word "negotiates". The TG used less pauses than the UnTG. Acoustic analysis showed that the word "reforms" was longer than "negotiates" in the UnTG. The mean fundamental frequency was higher for "negotiates".
CONCLUSION: Both groups demonstrated that the use of emphasis accompanies the individuality of speakers. The TG had better ability in the distribution of pauses. The words were distinctly emphasized due to syntactic and semantic aspects.

Keywords: Voice; Voice training; Voice quality; Speech acoustics; Communications media




In the oral communication process, voice and speech are the elements that give emotion and sense to the transmitted message. Aspects related to vocal quality, such as intensity, voice resonance and frequency, articulation pattern, duration, pause, rhythm, and speech rate express different emotions depending on the environment and the context in which they occur. These parameters are considered non-verbal communication components, and are also defined as prosodic suprasegmental elements. They offer to the speaker several possibilities of expression; its choice and usage are done according to the speaker's personality traits and attitude(1).

Thus, prosody is an important resource for the transmission of meaning, as well as for speech interpretation and comprehension. Many studies have investigated the way speakers manipulate prosodic resources to easy the transmission of the real meaning of speech. On the other hand, researches have analyzed how listeners use prosodic clues to better understand the received message(2,3). During speech, one of the strategies used by individuals to assure clarity and understanding is to highlight one or more words that carry important information. This may be done in several ways according to lexical, syntactic, and prosodic choices. Regarding prosodic resources, elements of vocal quality, melody, and speech dynamic are used to express the meanings of speech, among them tonal, duration and intensity variations, and use of silent pauses(1,4-9).

Prosody is a natural characteristic of the speaker. It might be consciously and rationally used in speech(10), both in spontaneous chatting and in professional communication. Radio and television announcers, reporters, and actors, among other voice professionals, have the need to adjust their communication to a specific demand required by their professional activity. The radio announcer is an example of versatile professional, that act in a demanding and diverse market, manipulating the different emphasis resources with the purpose to transmit the desired message in a clear and precise way.

The speech-language pathologist is one of the professionals involved in radio training. For over 20 years Brazilian speech-language pathology literature directed to professional voice have been revealing the characteristics of voice usage and vocal hygiene habits of radio announcers, presenting exercises created and selected for training these professionals(11), describing the communication profile of AM and FM broadcasting stations'(12,13), sports(14) and commercials(15,16) radio announcers.

The speech-language pathology training directed to radio announcers without voice complaint has the purpose not only of health promotion for this professional, but also to develop communication abilities that are adequate to radio. Regarding health promotion, guidelines and exercises are provided aiming at voice plasticity and resistance. Besides this approach, the future radio announcer is also submitted to voice and speech training, including reading and improvisation strategies and broadcasting presentations, in activities that are close to real situations faced on professional routine(17,18). Some important characteristics of radio announcing are voice flexibility and the use of varied emphasis resources(19).

This research had the purpose to investigate how the emphasis resources are used by trained radio announcers and by individuals without training. It was verified whether there are evident differences between both groups of speakers in the reading of selected words.



Participants were 77 individuals distributed into two groups: 51 students of a radio training course, 33 men and 18 women, with ages ranging from 19 to 57 years, who composed the trained group (TG); and 26 untrained volunteers, 16 men and ten women, with ages ranging from 20 to 48 years, who composed the untrained group (UnTG). All the TG participants were selected in a test conducted by the educational institution, were regularly enrolled in disciplines, had at least 75% of frequency in course classes, and did not present any signs or symptoms of voice disorders. The UnTG participants were employees of the institution who had administrative positions or employees with functions that did not require a specific voice demand; they had no previous voice training up to the data collection period, and also did not present signs and symptoms of voice disorders. None of the participants from both groups had reading complaints or difficulties.

The research project were approved by the Research Ethics Committee from Universidade Federal de São Paulo/Hospital São Paulo, under the protocol number 0843/06. All participants signed the Free and Informed Consent.

The training received by the TG participants is part of a radio announcer formation course that lasts in average two months. The course is essentially practical; classes are in groups of approximately 14 students and the subjects offered are AM and FM Radio Announcing Practice, Vocal Expression, and Interpretation, ministered, respectively, by a radio announcer, a speech-language pathologist, and an actor, all of them teachers at the institution. The purpose of the radio announcing subjects is to promote the practice of several genres and shapes of radio announcing using voice exercises in AM and FM language, news, commercial, sports, among other styles. The speech-language pathologist approaches aspects related to guidance and vocal health aspects, training and improvement of communication abilities, with the aim to promote voice flexibility and resistance, voice and speech adjustment to radio language, as well as the training of emphasis resources and communication expressiveness. Through the analysis of several and varied texts, the interpretation subject offers learning situations that explore the different intentions of the announcer, besides developing concentration as basis of expression.

The following news excerpt was selected for reading: "Trying to reverse the crisis scenario in the country, the president negotiates reforms with PMDB. Lula expects to be out of negotiations reinforced." The text is part of the training material, it had not been previously used in classes, and its content was randomly chosen. Participants read the news twice, and were asked to emphasize, in each reading, one of the underlined and bolded words in the text: "negotiates" and "reforms". The reading was recorded in two moments – moment 1 and moment 2 – with an interval of two months between them, which corresponded to the beginning and the end of the training course for the TG. The reading was done at first sight, and the only contact of the participants with the material was during the recording sections.

Reading samples were recorded in the radio studio of the institution, where all the practical classes are ministered. Participants were seated, keeping approximately 10 cm distance from mouth to microphone. For voice recording, equipments with the following characteristics were used: 12-channel recording/play mixer (Audioarts Engineering® R60); power amps system (Yamaha® P1600), Maxxtro® computer 256 Pentium IV, Windows XP operational system; and unidirectional microphone (Shure® SM58). The voices were recorded using the Sony Sound Forge 8.0 (version 8.0b) software, installed in the studio's computer, with wave output, mono register, 44.100 Hz and 16 bits sample rate. The same recording conditions were kept for both groups.

All the recordings were stored in CD and then edited in the PRAAT software (version 4.4.28, Boersma, Weenink), installed in an Acer® Travelmate 4100 Inter Pentium M Processor 740, 512 megabits memory computer, with touchpad mouse and Windows XP operational system. The edited material was prepared to undergo the following analyses: auditory-perceptual analysis of occurrence, evaluation and use of emphasis regarding tone range, intensity and duration aspects; spectrographic acoustic analysis for visual identification of tracing and delimitation of pauses; and measuring of duration and fundamental frequency parameters of the emphasized words.

Auditory-perceptual analysis was conducted in one session, by three speech-language pathologists with expertise in professional voice, individually. The judges listened to each participant's reading, edited in a CD and organized in casual order, with moments 1 and 2 readings presented randomly without identification. There was a 10% repetition of reading samples to test intra- and inter-judges' reliability.

Using a protocol specially developed to this task, each judge marked whether the participant had emphasized the selected words, evaluated the quality of the emphasis usage marking her impression in a 10-cm scale varying from bad (extreme left) to excellent (extreme right), and marked which emphasis resources were used according to the following parameters(4-9): ascending or descending tone range, intensity increase or decrease, increase or decrease of word duration. After marking the protocol scale, judges' impressions regarding emphasis evaluation were measured by a millimeter professional ruler, and the values were marked in centimeters.

The PRAAT software (version 4.4.28, Boesma, Weenink) was used to visually identify pauses delimitations and to acoustically analyze the duration and fundamental frequency parameters of the emphasized words. It was selected the broadband spectrogram, presenting good resolution regarding time and sharpness of the tracing, which facilitated the editions and the analyses, especially in connected speech.

To study pause distribution, it was analyzed the silent pauses accompanying the emphasized words. Pause delimitation was carried out by visual identification of spectrogram tracing, using auditory-perceptual support to confirm the occurrence. Presence and absence of pauses were numerically signed. For the duration measures of the emphasized words, each word was isolated with the cursor and, with perceptual help, it was established the acoustic limits, obtaining a duration value in seconds (s) by the software, in direct screen reading. Regarding fundamental frequency parameters, it was selected the mean, minimum, and maximum fundamental frequencies. Spectrographic analysis was randomly performed by a single person, the researcher. The values were confirmed by a second measure, without access to prior measurement.

The Wilcoxon Signed-Rank test and the Cronbach's Alpha test were used to verify intra- and inter-judge reliability, respectively. These last data had unsatisfying reliability level, and one judge was selected to provide the auditory-perceptual data to the research.

To analyze data of occurrence and usage of emphasis, the Chi-square test adjusted by Fisher statistics was used to compare moments 1 and 2; the Wilcoxon Signed-Rank test was used to verify possible differences between words; and the Mann-Whitney test was used to compare the groups. The Friedman and the Wilcoxon Signed-Rank tests (complementary) were selected to verify if the there was difference between the selected parameters in emphasis usage. For the emphasis evaluation data, the Student's t-test for paired data was used to verify possible differences between moments 1 and 2, and to compare the emphasized words. The Levene test for equality of variances was selected for the comparison between groups.

Regarding data about occurrence and distribution of pauses accompanying the emphasized word, the Chi-square test adjusted by Fisher statistics was used to compare moments 1 and 2; the Wilcoxon Signed-Rank test was used to verify possible differences between words; and the Mann-Whitney test was used for comparison between groups. The Student's t-test for paired data was applied to evaluate duration and fundamental frequency parameters, the last one divided by gender.

It was adopted the significance level of 5% (a=0.050) for all statistic tests applied. The software Statistical Package for Social Sciences (SPSS, version 13.0) was used for these analyses.



The data were raised and results were compared considering the following situations: two analysis moments, moment 1 and 2; two emphasized words, "negotiates" and "reforms"; two studied groups, trained (TG) and untrained (UnTG).

Regarding the occurrence of emphasis no differences were found between moments or between the studied groups (Table 1). Regarding the emphasized words, emphasis used in "reforms" was noted in higher occurrence than in "negotiates" in the TG, both in moments 1 and 2. Its occurrence was also higher in moment 2 of the UnTG. As for the occurrence of pauses, in moment 1 of the TG the pauses accompanying "negotiates" occurred twice more than in moment 2. The UnTG presented more occurrences of pauses when compared to TG, accompanying both studied words, both in moments 1 and 2.

Regarding the evaluation of emphasis, data showed there was no difference between moments (Table 2). Nevertheless, in the comparison between words there was a difference between the evaluation of "negotiates" and "reforms". In the TG, "reforms" was better evaluated both in moments 1 and 2. In the UnTG, the evaluation of "reforms" was better in moment 2. When the groups were compared, the emphasis in TG was better evaluated than the UnTG. The word "reforms" was better evaluated in the TG, inclusively at the first recording moment.

Regarding the use of emphasis, there was no difference between recording moments and between studied groups (Table 3). The comparison of the studied words showed that emphasis in "negotiates" was perceived by the duration increase of the word in the TG, and this was the resource used in 53% of the readings in moment 1 and 57% of the readings in moment 2. No predominance was found of a specific emphasis resource for the word "negotiates" in the UnTG, and the emphasis was perceived by variation of intensity and increase in duration. The word "reforms" was marked by increase in vocal intensity in both moments and in both studied groups. In the TG, this word was perceived in 90% of the emphases in moment 1, and in 82% of the emphases in moment 2. In the UnTG, intensity increase occurred in 88% of emphases in moment 1, and in 92% of emphases in moment 2.

Regarding the duration parameter, the studied words have higher duration in moment 1 than in moment 2 in the TG (Table 4). It was also observed that emphasis is used in different ways between the words: in the UnTG, "reforms" had higher duration than "negotiates" both in moments 1 and 2; in the TG, "reforms" had higher duration in moment 2. Comparison of the groups showed that the duration of "reforms" was higher in the UnTG in moments 1 and 2.

Data showed difference in the use of fundamental frequency between the studied words (Table 5). In women, the mean fundamental frequency of the word "negotiates" in the TG was higher than "reforms", both in moments 1 and 2. The same was observed in moment 1 of the UnTG. In men, the mean, minimum and maximum fundamental frequency values of the word "negotiates" were higher than "reforms" in the TG, both in moments 1 and 2. In the UnTG the mean fundamental frequency value of "negotiates" was higher than "reforms" in both moments. The minimum fundamental frequency value of "negotiates" was higher than "reforms" in moment 2 of the UnTG.



The prosodic elements of speech grant meaning to the message, representing the speaker's attitude and personality. Emphasis is one of the elements responsible for the transmission of the information meaning, and it is handled by the speaker according to the desired intention. Several emphasis resources are used to highlight an important information in speech, among them tone range and voice intensity, modification of the duration or a word, and use of pauses.

The occurrence of emphasis and silent pauses accompanying the emphasized word are analyzed together in the present study. Emphatic accent was observed since the first recording moment, with no differences between trained and untrained participants, which shows that the task of highlighting previously selected words during text reading is simple to be carried out. These data are reinforced by the evidence that emissions of untrained speakers provide valid information for the investigation of prosodic perception and emphatic accent(2,8). Regarding pause, it is known that its function is to highlight the text information contributing to the understanding of the transmitted message(1,3). To analyze its distribution in this research, we considered the silent pauses with emphatic function accompanying the studied words. The speech of radio announcers and television reporters are characterized by pauses with homogenous distribution(15,16), coherently used with the text meaning(20). Observing higher occurrence of pauses in the UnTG, it was verified that untrained participants did not have the ability needed to use pause in a balanced way. In the TG, on the other hand, it was observed a lower occurrence of pauses, which may be associated to more homogenous, coherent and proper use of this resource. Trained participants used pause in lower frequency, selecting other prosodic resources to emphasize the words, and assuring more fluency to their radio announcement(1).

This may have been one of the factors responsible for the differences observed between the evaluation of studied groups. The TG participants were better evaluated than the UnTG participants, showing more ability in emphasizing words during the reading task. All participants were asked to read the text as news, which is a pretty known pattern of reading to trained individuals, since it is part of the training exercises. Even in moment 1 of recording, which corresponds to the beginning of the course, the students were better evaluated, because they are already involved in the radio universe and, therefore, are more familiarized with the task. Hence, the TG participants performed the adequate adjustment of speech to the chosen text, explaining the better evaluation received.

Regarding the use of emphasis, no difference was found between recording moments and between the studied groups. The word "negotiates" was perceived by an increase in duration in the TG, and there was no predominance of a specific resource in the UnTG, which used both intensity and duration as resources. The emphasis resources related to tone range, intensity and duration occur concomitantly. In Brazilian Portuguese, the main accent correlates are, in the lexical level, duration, intensity and vocal quality(21). These parameters are related to the overall prominence of the tonic syllable, and not only to the emphasized word. To highlight words in Brazilian Portuguese, which is an accent language, one of the emphasis resources used is the increase of duration, and this may be done by elongation or syllabication of the words that the speaker wish to emphasize(1). The word "reforms" was marked by increase in voice intensity in both moments and in both studied groups. The increase in intensity for emphatic accent may be related to the reliability trait that is part of news announcement(22). This choice was not determined by the training, since it occurred in moment 1 and there was no difference between groups, in both moments. Therefore, the participants selected the intensity resource to emphasize, consciously or not of the strategy, probably based on hearing references acquired throughout life, mainly by those that have the habit of listening news on the radio.

Comparing the use of emphasis in the studied words, it was noted that the emphasis in "reforms" was more often perceived, was better evaluated and was highlighted by intensity variation, a resource related to reliability, showing that this word was spontaneously selected by the participants to be emphasized. The choice of the word emphasized in speech is influenced by aspects related to the syntactic and semantic structure of the phrase, its length, number of syllables, and issues related to standard news reading, elements that only partially determine the prosodic phrasing(2,4,6-10).

Regarding duration, emphatic accent was used in different ways between the words: in the UnTG, "reforms" had longer duration than "negotiates"; in the TG, "reforms" had longer duration in moment 2. The studied words had longer duration in moment 1 in the TG and, in the comparison between groups, the duration of "reforms" was longer in the UnTG. These data do not agree with literature that points out that emphasizes words have longer duration, are enounced slower, with better defined and marked articulation(1,4,6,8,15,23,24), a pattern that would be expected for the trained participants. However, this result may be due to the selected type of text; news reading is influenced by message content and radio announcement style(20,25). It is usually dynamically read, decreasing speech time, which would modify the duration of the phrase, in general, and the emphasized word, specifically.

Regarding fundamental frequency, results were separated according to gender, since men and women have different intone contour in radio announcement(22,25). This research investigate the frequency variation in emphasized words, and there was no difference between moments and between groups in both genders. This data reveal that the use of emphasis resources related to the variation of fundamental frequency was not determined by voice and speech training. There was difference in the use of fundamental frequency between words. In the TG, the mean fundamental frequency of the word "negotiates" was higher than "reforms" for women; the mean, minimum, and maximum fundamental frequency values of the word "negotiates" were higher than "reforms" for men. The difference between words was already expected due to aspects of phrase structure previously commented. While "negotiates" was emphasized by fundamental frequency increase, the emphatic accent in "reforms"
was characterized by increase in duration. The duration increase frequently occurs in Portuguese to characterize the emphasized words, and this resource was used in "reforms", which was considered the word spontaneously chosen by the participants of this research to be emphasized.

Distinct results were found between perceptual, acoustic and visual analysis. The choice of different investigation parameters was due to inherent characteristics to each evaluation that, in the present study, offered different perspectives of analysis. Moreover, as previously discussed, emphatic accent in words occurs by the concomitant use of several resources and, for this research, the most adequate parameters for each type of analysis that seemed to be the most usual in participants' speech were chosen.

Emphases were distinctly used in the selected words in all analysis conducted in this study. Data revealed and confirmed the close relationship between prosodic phenomena and aspects related to phrasal structure and its syntactic and semantic components, to the organization of speech production, to reading style, and also to physical, emotional and cultural characteristics of the speaker.

Another important factor is that, in most evaluations, the groups behaved similarly in both studied moments. The difference between TG and UnTG groups were observed only in three analyses: emphasis evaluation, distribution of pauses, and duration of the studied words, situations in which differences were also observed between recording moments 1 and 2, which corresponded to the beginning and the end of the TG course, respectively. Since this group was better evaluated than the UnTG, it was verified that training may have been responsible for these differences, contributing to the development of more ability in the use of emphasis resources, improving the use of pauses, and diminishing the duration of the emphasized word, which assured more fluency in radio announcing and better adjustment of these resources to news reading. Other studies have also verified the effects of speech-language pathology training in students of radio training courses. They showed the efficacy of Speech-Language Pathology in a research based on strategies of health promotion and practice activities orientated to radio announcing, and verified improvement on the vocal quality and on the expressiveness of radio announcements after training(17-19).

Therefore, the results of this research showed that voice and speech training influenced the use of prosodic elements in radio announcing, but this was not the only decision factor in choosing the voice and emphasis resources used in literature. Its effect may be observed in parameters such as reading fluency, better distribution of pauses, and adequate adjustment of voice resources to news reading, making the radio announcement better accepted by listeners. The finding that the use of emphasis occurs similarly in most analysis reveals that prosodic resources are also defined by aspects related to biological, psychological and social aspects of speakers. It is a particular choice of the individual, whether he is conscious or not of this strategy, since the emphasis resources carry his emotions, feelings and wills. These are non-verbal communication components are, therefore, are intuitive, spontaneous and less susceptible to control.

Training does not completely determine the vocal behavior of students, but it modifies, adjusts and improves the use of pre-established emphasis resources. Regardless of training, individuals showed ability to select the desired prosodic elements because they already have some professional experience in radio announcing area or because they are more expressive than other people. The speech-language pathologist that works with voice professional, developing the communication improvement, cannot be restrict to training motor skills responsible for voice control and flexibility. His work must involve aspects related to the message content that the speaker is willing to transmit. Elements such as phrasal structure, speech style, speakers' attitude, and emotions involved in speech must be considered in the elaboration of strategies to expressiveness training and, therefore, professional communication will happen clearly, efficiently and naturally.



Emphasis resources were used similarly by individuals with and without voice and speech training in most of the performed analyses. Therefore, the use of emphasis was a particular choice of the individuals, following personal characteristics.

Voice and speech training partially interfered in the selection of emphasis resources used in previously selected words from a news reading. Trained individuals had more ability to adjust emphatic accent according to the type of text selected, assuring more fluency and adequate distribution of pauses in news reading.

The words "negotiates" and "reforms" were distinctly emphasized according to syntactic and semantic aspects inherent to phrasal structure.



