Forward masking with frequency-following response analyses

Purpose: to analyze forward masking in normally hearing young people, by using fre-quency-following responses. Methods: the synthetic syllable /da/ was used for the recordings of ten individuals, in the following conditions: /da/ with no masking, and /da/ after 4, 16, 32, and 64 milliseconds of masking. F-test (ANOVA) was applied for repeated measures with the Greenhouse-Geisser correction to compare testing conditions. For significant differences, multiple comparisons (between pairs of conditions) and Bonferroni correction were used. Data normality was verified by applying the Shapiro-Wilk test, and statistical significance was used at 5%. Results: wave latencies of all masking conditions were compared with those of no masking. A latency delay was observed in the transient region of the response (PV and A) in all masking conditions, except for 64 milliseconds. Latency delay also occurred for waves PW, PX, and PY, which corresponded to the sustained region of the response. Conclusion: forward masking was observed, by using frequency-following responses with /da/ syllable in four intervals (4, 16, 32, and 64 ms) of preceding masking. Forward masking was more evident in the transient region of the response than in the sustained one. This study highlights the importance of electrophysiological testing in temporal processing assessment.


INTRODUCTION
Auditory processing involves all connections from cochlea to the auditory cortex. It is responsible for sound localization and lateralization, auditory discrimination, auditory pattern recognition, temporal hearing (temporal resolution, temporal masking, temporal integration, and temporal ordering), and auditory performance with competing and degraded acoustic signals 1 . When these abilities are well-developed, adequate auditory processing occurs, whereas difficulties in them may cause problems to understand speech sounds 1,2 .
Most social listening situations require the listener to recognize speech in background noise to establish social communication. However, understanding speech in noise -which is related to the auditory temporal processing, specifically temporal masking -is a challenge for some listeners. Aspects involved in understanding speech in noise have been broadly investigated, and such related complaints are frequent, even among normally hearing people 3 . The ability to recognize speech in noise is based on temporal perception of sounds -therefore, it is also related to auditory temporal processes 4 .
Temporal processing is the ability of the auditory system to perceive or distinguish different stimuli in a transient temporal sequence. It encompasses four categories: temporal ordering, temporal resolution, temporal masking, and temporal integration 1 . Temporal masking is the change in a sound threshold due to presence of another one. When a target speech and the masking noise are perceived at the same time, masking effect is called simultaneous temporal masking; when the masking noise is perceived a few milliseconds before the target speech, masking effect is called forward masking; and, when the masking noise occurs after the target speech, masking effect is named backward masking 5 .
In forward masking, masking noise remains in the auditory system for a few milliseconds after it has physically ceased or decreased in amplitude, leading to a change in forward speech perception 3 . This effect happens when the speech signal and masking noise are separated by different intervals. It may happen because the hair cells, after being stimulated by the masking noise, require a few milliseconds to recover their sensitivity, to then be stimulated by the subsequent speech sound. The magnitude of this recovery depends on several characteristics, such as the interval between noise and speech 3 .
Electrophysiological measures have been used to study auditory processing 5 , by using various stimuli, including speech sounds, to elicit a response 6 . Speechevoked auditory brainstem response (sABR), by using a syllable as stimulus (usually /da/) generates responses according to the transient and sustained components of the syllable (/d/ and /a/, respectively). A series of positive and negative peaks are observed regarding the transient and sustained components of the syllable. The sustained region of the response is called frequencyfollowing response (FFR) due to its periodic characteristics. However, FFR has been commonly used to refer to speech-evoked auditory brainstem responses, including both transient and sustained portions of the responses [6][7][8] .
Considering /da/ syllable, the consonant /d/ is its initial (onset) and transient component, and the vowel /a/, its sustained component. According to Skoe and  Despite a great number of electrophysiological studies with speech stimuli, few investigations have used forward masking to influence the response. This study aimed to analyze forward masking in normally hearing young people, by using frequency-following responses (FFR).

METHODS
This cross-sectional observational study was approved by the Research Ethics Committee at the Federal University of Pernambuco (Universidade Federal de Pernambuco -UFPE), Brazil, under protocol number 1.727.677.
Ten individuals, six of whom females, participated in this study. They were 18 to 25 years old (mean age 21 years), with normal hearing (pure-tone thresholds ≤ 25 dB HL at frequencies from 250 to 8000 Hz), and no history of speech and/or neurological disorders.
Any concerns regarding auditory processing disorders were investigated in the interview.
All FFR were recorded with Intelligent Hearing System (IHS), with participants inside a sound booth. Their skin was prepared with abrasive paste, and electrodes were placed according to 10-20 International System, as follows: two inverted electrodes at the mastoid (M1 and M2), a non-inverted electrode at Fz, and the ground electrode at Fpz. Both speech stimulus /da/ and masking noise were sent to the right ear via insert earphones (E39). The stimulus rate was 3.77 m/s, the window was set at 70 ms, with high-and low-pass filters at 50 Hz and 3000 Hz, respectively. For each trace, a total of 9,000 sweeps were acquired in three replicable runs of 3,000 sweeps. An ipsilateral channel was used to analyze the waves. In the resulting tracings, the PV (positive peak) and wave A (negative peak) of the transient region were identified and analyzed, as well as the PW, PX, PY, PZ, and O of the sustained component. Absolute latencies of all peaks were determined and analyzed by two audiologists with expertise in electrophysiological exams.
The synthetic syllable /da/ and a speech-shaped noise (SSN) were used. The speech syllable contains a transient component -consonant /d/ -and a sustained component -vowel /a/ . The syllable lasted 40 ms and was presented at 75 dB peSPL, in alternated polarity. Masking noise, whose spectrum included Portuguese speech frequencies, was developed at University of North Carolina at Chapel Hill. It was presented at a fixed intensity of 80 dB SPL, lasting 100 ms (10 ms onset/ offset ramps).
Statistical analysis was performed to compare tested conditions: without noise was labeled "Unmasked", and those with a masking noise were labeled "4ms", "16ms", "32ms", and "64ms", referring to the delay between noise and syllable.
A descriptive analysis was performed for mean, standard deviation, coefficient of variation, minimum, and maximum value. Inferential analysis was performed with F-test (ANOVA) for repeated measures, with the Greenhouse-Geisser correction to compare testing conditions. For significant differences, multiple comparisons (between pairs of conditions) and the Bonferroni correction were used. Data normality was verified by applying the Shapiro-Wilk test.
Statistical significance was used at 5%. Statistical analyses were performed, by using the Statistical Package for the Social Sciences (SPSS), version 23.  Table  1. Results indicate that the mean latencies for PV, A, PY, PZ, and O were lower for the unmasked test. PW latency was lower in the masked test at 64 ms as compared to the unmasked condition (22.03 and 22.05 ms, respectively). PX latency had the same mean values for unmasked and masked condition at 64 ms (30.65 ms).  Regarding the masked conditions, latency of all waves was higher in 4 ms test and lower in 64 ms test (except for wave PZ). Variability expressed by the coefficient of variation was low (< 33.3%).

RESULTS
Significant differences in latency values were observed for most of the masked conditions when compared with unmasked latencies, except for the PZ wave, which had no significant differences in any of the masking conditions.

DISCUSSION
The ability to recognize speech in noisy environments requires the auditory system to distinguish target sound from the noise, especially when the signalto-noise ratio is small 3 , which can cause changes in speech perception. In noisy situations, the auditory system might not process transitory elements of the speech, as it is a complex sound with variations in frequency and amplitude 9 . Speech properties start to be processed at subcortical levels, at the brainstem nuclei -hence, recording the brainstem electrical responses provides accurate data on the word processing 10 .
For the unmasked condition, latency responses of the transient component (PV and A) were similar to those previously reported. In the present study, latency for PV was 7.1 ms, and for A, 8.4 ms, as compared to wave V latency of 6.53 ms, wave A latency of 8.0 ms, wave V latency of 6.61 ms, and wave A latency of 7.5 ms 11 . This suggests an established brainstem response pattern for the transient region of the response of a speech stimulus.
In forward masking (i.e., masking preceding the speech stimulus), there was a significant delay in wave latencies elicited by the transient region of the stimulus (PV and A), when the /da/ syllable was presented 4, 16, 32, and 64 ms after the noise as compared to the unmasked responses of the abovementioned waves (except at 64 ms for wave A).
A greater simultaneous masking influence on the transient region of the response has been reported 11 . This may be due to the transient characteristics of the consonants, which exhibit more vulnerable components when exposed to noise. In noisy conditions, amplitude responses are smaller and do not show a high temporal periodicity 7 . Moreover, wave responses are temporally closer to the end of masking noise, and this may also contribute to a greater interference of the masking 9 .
Greater PV latency delays are also reported when compared with unmasked responses, suggesting a greater masking effect. Fogerty et al. 12 conducted a study using FFR with 18 syllables (combining consonants b, d, g, j with vowels a, i, u to obtain consonantvowel and vowel-syllable syllables) in the following conditions: unmasked, with simultaneous masking, and masking preceding the stimulus (forward masking) at 10, 40, and 100 ms. They reported that the mean latencies also increased in both parts of the responses, though greater for the transient region when compared to the sustained region.
To understand forward masking effect on PV latencies, Walton et al. 13 used tone-burst stimuli (at frequencies of 1000, 4000, and 8000 Hz), in the following conditions: unmasked and masked preceding speech stimulus at 2, 4, 8, 16, 32, and 64 ms. Latencies increased at 4, 8, and 16 ms a compared to the unmasked condition. PV wave latency returned to values similar to unmasked condition when masking preceding the stimulus was at 64 ms -as found in the present study. Similar results were also described 5,7,11,14 with the /da/ syllable in unmasked and masked conditions; a significant latency delay was found in the transient region of the response when tested with background noise.
The consonants (which are the transient component of the syllable) seem to be more susceptible to the masking noise effect, as they have low-intensity acoustic cues and do not have high temporal periodicity 11 . A latency delay on the transient region of the response (waves PV and A) suggests a forward masking effect.
Psychoacoustic studies 3,5,9,15 have also demonstrat forward masking effect with an increase in hearing thresholds for a masked condition, preceding a short target stimulus -the smaller the intervals between masking noise and target speech, the higher the thresholds.
Grose et al. 15 have also investigated forward and backward masking effects on young and middle-aged people and found higher thresholds in all masked conditions in relation to age. This may explain why it is more difficult for older adults to understand speech in environmental noise.
Dubno et al. 16 presented a masking noise both simultaneously with and before target stimulus at 10, 20, 50, and 100 ms. Thresholds have also increased when both masking conditions were tested, evidencing the forward masking effect.
Forward masking has been widely documented by psychometric and electrophysiological studies, showing greater interference on the transient region of the responses, with minor effect on the sustained part of the FFR responses 17 .
In the present study, changes were observed in wave latencies related to the sustained region (PW-O). However, these changes were less significant and more inconsistent. It may be difficult to identify a forward masking effect in this region of the response because of the interaction between the masking effect of the preceding noise and a masking effect caused by the stimulus itself (due to its more complex characteristics) 18,19 .
Latencies of the sustained region of the response in unmasked condition had been previously demonstrated demonstrated 13 . In the present study, when the stimuli were presented at different intervals (4, 16, 32, and 64 ms) after the masking noise, latencies differed for four masked conditions. However, no significant differences were found in the sustained masked responses when compared with the unmasked condition -perhaps due to sweep differences between those two studies, while here a great sweeps were used. Furthermore, changes in fundamental frequency can cause changes in the sustained region. Fundamental frequency of the stimulus plays an important role in subcortical coding, facilitating (or confusing) perception of the sustained component of the syllable. However, it is known that vowels (sustained component) contain intense acoustic cues with higher periodicity and are less influenced by noise 18,19 .
Speech coding at subcortical level suggests that speech perception may be influenced by decoding temporal aspects of speech and that this ability is already perceived at brainstem level, which is essential for good comprehension of complex sounds such as speech 10 .
In summary, results of the present study reinforce that the transient component of the stimulus (i.e., consonant) is more susceptible to noise. This was observed in wave PV and A latency delay. On the other hand, the sustained component, represented by PW to O latencies, seems to be less influenced by noise, possibly because vowels are more intense and have higher temporal periodicity, which favors their perception during forward masking. In other words, although a latency delay has been observed in the sustained region after noise, forward masking effect was stronger on the transient component of the stimulus.
FFR seems to be an objective measure to understand forward masking in different populations, especially when using a 4 ms delay between the syllable and the noise.

CONCLUSION
Forward masking was observed, by using frequency-following responses (FFR) with the / da/ syllable at four intervals (4, 16, 32, and 64 ms) of preceding masking. Forward masking was most evident in the transient region of the response (which corresponds to consonant /d/). In the sustained region (corresponding to the vowel /a/), forward masking was noticed in the 4 and 16 ms testing conditions, while less evident in the 32 and 64 ms ones. Therefore, no clear pattern of forward masking on the sustained portion of FFR responses was found.