Modulation rate and age effect on intermittent speech recognition

Purpose: to investigate the auditory recognition of intermittent speech in relation to different modulation rates and ages. Methods: 20 young people, 20 middle-aged adults, and 16 older adults, all of them with auditory thresholds equal to or lower than 25 dB HL up to the frequency of 4000 Hz. The participants were submitted to intermittent speech recognition tests pre sented in three modulation conditions: 4 Hz, 10 Hz, and 64 Hz. The percentages of correct answers were compared between age groups and modulation rates. ANOVA and post hoc tests were conducted to investigate the modulation rate effect, and the mixed linear regression model (p < 0.001). Results: regarding the age effect, the data showed a significant difference between young people and older adults, and between middle-aged and older adults. As for the modulation rate effect, the indexes of correct answers were significantly lower at the slower rate (4 Hz) in the three age groups. Conclusion: an age effect was verified on intermittent speech recognition: older adults have greater difficulty. A modulation rate effect was also noticed in the three age groups: the higher the rate, the better the performance.


INTRODUCTION
In many social interactions permeated with oral communication, the interlocutor's speech is just one of the sounds present in the environment. Other sounds present there as well may mask, though partially, the speech stimulus one tries to hear. This happens when the time and/or frequency spectrum of the environmental noise coincides with that of speech. In this situation, characterized by low-redundancy hearing conditions, the listener hears only the fragments of speech whose acoustic and temporal characteristics do not coincide with that of the masking noise. The result is the perception of an intermittent (with time interval segments) and/or distorted speech (with frequency spectrum segments). Adequate speech recognition in such situations requires the listener to be able to find meaning in a speech with countless time windows or missing frequency ones. This is how the interlocutor's message is perceived and interpreted by the listener when hearing a speech concomitantly with noise 1,2 .
The difficulty in speech sound recognition in noisy environments increases with advancing age. Sensory hearing loss, common to the older population, is pointed out as one of the causes of their difficulty to recognize speech sounds 2,3 . However, studies indicate that, regardless of any deficit in auditory sensitivity, older adults whose hearing is within normal standards have a greater difficulty to recognize speech in noisy environments when compared with young listeners 2,4-6 .
It is known that temporal masking is related to speech recognition difficulties in noisy environments 7,8 . Temporal masking is the change in a sound's threshold in the presence of another subsequent stimulus. This takes place when the duration and intensity of a given stimulus are enough to reduce the sensitivity to another stimulus 9 . Studies show that the masking effect is greater in older adults with normal auditory thresholds than in young people with normal hearing as well [10][11][12][13] . It is noticed, then, that temporal masking is related to the older adult's greater difficulty to recognize speech in noisy environments 14 . Nonetheless, one question has not yet been understood, which is whether a factor other than the temporal masking effect is related to such a difficulty.
A possibility is a natural decline in the ability to recognize low-redundancy speech -more specifically, in this case, the intermittent speech (time segmented), similar to speech perceived in noisy environments. Hence, the research question of this study arises: Does the ability to recognize intermittent speech naturally decline with age?
The hypothesis is that the ability to recognize intermittent speech declines with advancing age. Perhaps, a decrease in this ability's performance may be identified even in middle-aged adults, who have not yet reached senescence. This hypothesis was tested in this study comparing the performance of three age groups (young people, middle-aged adults, and older adults) in intermittent (time-segmented) speech recognition tests.
This study is justified because hearing is an essential part of people's communication in social interaction, and particularly, because older adults increasingly have hearing difficulties and socialization-related challenges. Such factors have increased, in the last years, the scholars' interest in the subject 7,8,15,16 . Thus, this study aimed to investigate the auditory recognition of intermittent speech with different modulation rates and at different ages.

METHODS
This is an observational, analytical, cross-sectional study, approved by the Human Research Ethics Committee of the Department of Health Sciences at the Universidade Federal de Pernambuco, Brazil, under number 2.532.384.
The sample comprised 56 participants selected by convenience and divided into three groups: Group 1: with 20 young people aged 18 to 25 years (mean age 21 years), of both sexes (12 were females); Group 2: with 20 middle-aged adults, 45 to 55 years old (mean age 48 years), of both sexes (16 were females); Group 3: with 16 older adults, aged 60 to 77 years (mean age 65 years), of both sexes (14 were females).
To meet the inclusion criteria, the participants had audiometric thresholds equal to or lower than 25 dB HL at the frequencies of 250 to 4000 Hz in at least one year. All the participants in the groups of young people and middle-aged adults had audiometric thresholds equal to or lower than 25 dB HL at the frequencies of 250 to 8000 Hz, while in the group of older adults only two participants had thresholds worse than 25 dB HL (between 35 and 45 dB) at the frequencies of 6000 and 8000 Hz. The graphic representation of the mean audiometric thresholds in the tested ears of the three participating groups is shown in Figure 1. Individuals with a diagnosis or complaint of any change in the auditory, neurological, psychiatric, or cognitive system that might interfere in any degree with oral communication were excluded from the study.
The subjects invited to participate in the study had the research's objectives and procedures introduced to them and their participation's risk and benefits explained. After accepting to participate in the research, they signed the informed consent form (ICF) and, on a scheduled date, were submitted to pure-tone audiometry to verify whether their audiometric thresholds met the established inclusion criteria.
The participants who met the inclusion criteria were invited to continue with the intermittent speech tests. The speech tests were performed with the participant seated in a sound booth. The speech stimulus was monaurally presented with a Sennheiser HD580 headphone to the right ear. In the cases that the left ear's three-frequency mean (500, 1000, and 2000 Hz) was lower than the right one's by 10 or more dB, it was decided to perform the test on the left ear. Hence, the participants' best ear was tested. The decision to test only the best ear was to minimize the influence of fatigue and learning on the results. The sentences were acoustically modified with a speech processor manufactured by Tucker Davis Technology, model RZ6, and the MATLAB TM (Matrix Laboratory) program. The temporal segmentation was conducted with amplitude modulation of the acoustic signal, with a square-wave modulator. The sentences were 100% modulated (total amplitude limitation), resulting in an interrupted, intermittent speech sound. The intermittent speech was randomly presented at 65 dB in three testing conditions, with a different amplitude modulation rate in each of them. The modulation rates were tested at 4 Hz, 10 Hz, and 64 Hz. The modulation rate is the number of times the amplitude decreases 100% in intensity in one second. For example, in the modulation at 4 Hz, the amplitude decreased 100% (modulation cycle) four times in each second of the sentence. Each of the modulation cycles lasted 250 ms; as it was modulated in a square wave, half this time (125 ms) was characterized by the absence of acoustic information. Hence, in every second of speech, only half of it (500 ms) had the verbal information of the sentence. The segmentation in 4 Hz is illustrated in Figure 2: The rectangle blocks represent the time with verbal information, and the empty spaces represent the time without acoustic information.

Figure 2. Illustration of segmented speech at 4 Hz
The same programming was followed in the other modulation rates. In the 10 Hz modulation, there were 10 periods with lowered amplitude (absence of acoustic information), each one lasting 50 ms. In 64 Hz, each one of the 64 periods without information lasted 8 ms. It is noticed that the total time without information is the same in the three testing conditions (500 ms in every second); the difference between them is the distribution with and without the absence of sound.
The participants were instructed to repeat the sentences as they heard them; they were also told that the words would be unintelligible at times. The researchers, who remained outside the sound booth, controlled the sentence presentation and registered the participants' errors and omissions.
As each sentence was presented, its text appeared on the researcher's computer screen with all the words highlighted in a shaded rectangle where markings were made. The words of the sentence the participant did not repeat or incorrectly repeated were marked with the mouse and counted in the program.
The participants were tested three times in each modulation rate (4, 10, and 64 Hz), resulting in three percentages of correct answers for each rate. In each modulation, 25 sentences were used (equivalent to one whole HINT list plus five sentences of the following one). Therefore, the test sequence took place as follows: a) the modulation rate to be tested was drawn; b) the test began with list 1, sentence 1; after 25 sentences when list 1 had already been completely used, as well as five sentences of list 2, the first test was finished, and the first percentage of correct answers was obtained; c) the modulation rate was drawn once again; d) the test resumed with list 2, sentence 6; after 25 sentences had been presented (15 from list 2 and 10 from list 3), another percentage of correct answers was obtained; and so forth. Thus, the sentence presentations were randomized between modulation rates and rate presentation sequences, eliminating fatigue bias associated with the last presentations, and a possible bias caused by an easier or tougher list when performing the task. Also, the number of available sentences in HINT (240) made it possible not to repeat them, eliminating the learning bias.
During the test, the researcher did not provide any clue to the participant regarding their performance. They were given breaks to rest whenever needed.
The percentages of correct answers were computed with the MATLAB program. The calculation was based on the number of words used in the 25 tested sentences (the total number of words varied between sentences). Afterward, the arithmetical mean of the three percentages was calculated for each modulation rate; this mean value was the result considered for statistical analysis. The purpose of obtaining three percentages of correct answers was to achieve a better representativity of the participant's performance in the modulation rate tested.
The data were analyzed with STATA/SE 12.0 and Excel 2010. All the tests were applied with 95% confidence (p < 0.05). To verify the normality of the data, the Kolmogorov-Smirnov test was used with quantitative variables, resulting in normally distributed variables at 4 Hz, whereas at 10 and 64 Hz the variables had non-normal distribution.

RESULTS
The percentages of intermittent speech recognition were described by age groups and modulation rates. To investigate the age effect, the data were compared between the three age groups, according to the different modulation rates. These results are described in Table 1.
Therefore, to investigate the modulation rate effect, the analysis of variance (ANOVA) was conducted with Tukey's post hoc (normal distribution) and Kruskal-Wallis with Dunn's multiple comparisons (for non-normal distribution). To investigate the age effect, the method used for repeated measures was the mixed linear regression model, which considers the possible relationship between the response values with repeated measures. The data show a significant difference between young people and older adults, and between middleaged and older adults. The group with young people had a performance similar to that of the group of middle-aged adults in all the testing conditions (4 Hz, 10 Hz, and 64 Hz). Regarding the participants' schooling level, all those in the group of young people reported they had finished high school. The group of middle-aged adults had 15 participants with a bachelor's degree, four who finished high school, and only one with unfinished high school. In the group of older adults, four had a bachelor's degree, five had finished high school, five had finished middle school, and two had not finished middle school.
The group with the highest schooling level was that of the middle-aged adults. Nonetheless, this group had lower percentages than the group of young people in the 4 Hz modulation. These results show that the participants' schooling is a factor that did not seem to influence the results. This is because the sentences used in the hearing task are simple ones, commonly used in colloquial Brazilian Portuguese 17 .
The results of the analysis of the modulation rate effect on intermittent speech recognition are described in Table 2.
When comparing the test performance means for each modulation rate between all the participants (grand average), a significant difference between the three tested rates is perceived. Specifically, when the results between the modulation rates are compared in each age group alone, only for the middle-aged adults between the 10 Hz and 64 Hz rates no significant difference is noticed.

DISCUSSION
One of the hypotheses of the present study is the influence of age on intermittent speech recognition. In other words, the hypothesis is that, due to the decline in the temporal processing skills, older people have greater difficulty to recognize intermittent speech. The findings in this study confirm such a hypothesis. In all the tested modulation rates, the mean percentage of correct answers decreased as the age increased.
Shafiro and collaborators 18 investigated the influence of age and hearing loss on interrupted speech intelligibility. The sentences were modulated at rates between 0.5 and 24 Hz, and the results demonstrated an existing relationship between speech recognition, age, and hearing loss. Kidd and Humes 19 also investigated the effect of age and hearing loss on intermittent speech recognition. Young people with normal hearing and older adults with and without hearing loss were presented target words both alone and within the context of a sentence, using a series of interruption patterns in which parts of the speech were replaced with silence. The results demonstrated that both age and hearing loss affect the perception of intermittent speech.
The data of the present study corroborate the findings of Shafiro and collaborators 18 and Kidd and Humes 19 concerning age. In both studies, as well as in the data here presented, the results were better for the groups of young people than for the older adults.
The effect of age on the older adults' peripheral and central hearing is widely documented. Physiological changes in their auditory system, as well as cognitive and memory difficulties, explain the decline in sound processing skills -particularly, speech recognition in noisy environments 5,20,21 .
Saija et al. 22 observed the older adults' difficulty in cognitive tasks using degraded phonemes. However, the sample studied had presbycusis, which made it impossible to determine whether their difficulty was due to peripheral or central limitations. The older people studied here did not have hearing loss. All the same, their performance was inferior to that of the young people and middle-aged adults.
It is known that the decline in auditory sensitivity due to presbycusis has a significant role in speech recognition, especially with low-redundancy speech. However, the older adults' data in this study show that the central auditory system must also be related to the decline in intermittent speech recognition skills. In the studied group, only two older adults had a downsloping audiometric configuration, reaching thresholds between 35 and 45 dB HL at the frequencies of 6000 and 8000 Hz. Hence, it can be said that, in terms of peripheral hearing, the groups compared here were equivalent. It is believed, then, that changes in the central auditory skills play an important role in these results.
Another aspect that may be related to the performance of the group of older adults in this study is their schooling level. It was the only group containing participants with unfinished high and middle school. Although the sentences in the Brazilian version of the HINT test are colloquially used and simple for native Brazilian Portuguese speakers, the possibility of the schooling's influence on speech recognition cannot be discarded. Studies with more robust samples can investigate this variable.
On the other hand, aspects arising from advancing age can bring a positive contribution. Studies show that older people can at least partially overcome their hearing difficulties with the linguistic competence acquired throughout life. The experience with their language's pragmatics and semantics promotes compensatory mechanisms that help communicate in complex, low-redundancy sound environments [22][23][24] . This is consistent with the idea that the top-down processes (a term used to characterize the contribution of the cortical processes to receive stimuli) can help supply acoustic information that has been lost, making it easier to focus attention and process the speech signal 25 . The data of the present study do not show evidence that these processes have contributed to degraded, intermittent speech recognition, because the older adults' performance was inferior to that of the other groups when the modulation rate was presented at 4 Hz. However, the absence of this contribution cannot be stated.
The difficulty to recognize intermittent speech may be also related to the decline in cognitive processes 26 .
In an initial interview, the older participants of the present study reported not having any complaints or symptoms possibly related to cognitive aspects. However, they were not submitted to specific tests to dismiss changes in cognitive skills. Certainly, not knowing this group's cognitive performance, due to a methodological limitation, represents a bias in data interpretation. Hence, it is suggested that future studies investigate the relationship between performance in intermittent speech recognition and cognitive aspects, particularly in the older population.
It is not known, precisely, when or in what age group the age-related difficulties start appearing. In this sense, the results found here for the group of middle-aged adults are an interesting finding. In the three testing conditions, this group's results were similar to those found in the group of young people, while different from those found in the group of older adults. This group's proximity to that of young people, in terms of performance in intermittent speech recognition, suggests that the decline in the necessary skills to close the gaps in the acoustic information of intermittent speech may not begin until little before senescence.
Contrary to what has been found here, Grose, Mamo, and Hall 12 point to the onset of temporal processing decline relatively early in the aging process. The authors refer to studies conducted with middleaged adults whose peripheral hearing is normal and conclude that changes in the auditory processing start before senescence.
The second hypothesis of this research was the existence of a modulation rate effect on intermittent speech recognition. The mean percentages of correct answers were compared between the tested rates in all the participants (grand average) to investigate this effect. The response pattern found confirms the hypothesis. As the modulation rate increased, the percentages of correct answers increased in the three tested groups.
The improvement in intermittent speech recognition with a higher modulation rate frequency can be explained by the shorter time interval with the absence of acoustic information caused by the higher rates. That is, as the rate increases, the number of cuts in the acoustic information increases in the same time interval (1 second). For instance, in 4 Hz there are four cuts in one second, whereas in 64 Hz there are 64 cuts. Even though there are more acoustic information cuts at higher rates, these cuts are briefer, and thus less acoustic information is removed from the sentence, making them easier to understand. On the other hand, more acoustic information is removed from the sentence segmented in lower modulation rates.
This effect seems to be more significant at rates lower than 10 Hz -in the case investigated here, at 4 Hz. The percentages of correct answers at 10Hz and 64 Hz were high (from 88.72% to 99.69%), near an ideal hearing situation, while the values obtained at 4 Hz were lower than 74%, even reaching 55.92% (recognizing little more than half of what had been said) in the group of older adults. The performance tends to be lower in slow interruption rates -in this case, 4 Hz -, in which the listeners have longer parts with the absence of acoustic information. Such an absence may result in greater lexical uncertainty and make the benefit of contextual cues more difficult.
The task of recognizing speech in masking noise with oscillating amplitude may be similar to the task of recognizing intermittent speech. In both situations, the listener faces the absence of acoustic information in the target sentence. However, although the low-redundancy hearing situation is similar between the two testing conditions, the presence of the masking noise causes the forward masking and backward masking effects, in which the speech's acoustic information in the noise's low-amplitude time intervals is still interfered with by the masking noise. In this situation, the time intervals with speech information are even more decreased than what is presumed when analyzing only the modulation rate.
Advíncula et al. 27 investigated the performance of young listeners in sentence recognition (Brazilian HINT test version) in masking noise oscillating in different modulation rates: 4 Hz, 8 Hz, 10 Hz, 16 Hz, 32 Hz, and 64 Hz. The findings show similar performance for the rates of 4 Hz to 32 Hz, while the results at 64 Hz were worse than at the other investigated rates. It is important to note that the pattern found by them is the opposite of the pattern found in the present study because the modulation took place in the masking noise instead of the speech itself. Therefore, speech recognition performance in modulated noise worsens as the modulation rate increases.
Although the patterns are inverted, the logic is the same. The authors explain that the higher rate (64 Hz) brings the modulated noise physically closer to the stable noise. That is, the time intervals in which the noise has its amplitude decreased are shorter, making it more difficult to perceive the acoustic signals of speech. The results found here follow this same line of thought. Better speech recognition percentages are found for the testing conditions with less acoustic information loss.
It is important to clarify that, although the results of the group of older adults (particularly in the 4 Hz rate) were lower than those of the other groups, the percentages found here do not necessarily represent difficulties in social hearing. The investigation of the extent to which the percentages in this study interfere with daily communication requires studies with a methodology appropriate to that end. Another factor not explored in this study was whether there is a predominant side of the auditory cortex in the intermittent speech recognition task. Further research is suggested to explore this issue.

CONCLUSION
The findings showed an age effect on segmented speech recognition: older adults had a greater difficulty to recognize intermittent speech. A modulation rate effect was also noticed in the three age groups: the higher the rate, the better the performance.