Statistical analysis of event-related potential elicited by verb-complement merge in Brazilian Portuguese.

An interesting fact about language cognition is that stimulation involving incongruence in the merge operation between verb and complement has often been related to a negative event-related potential (ERP) of augmented amplitude and latency of ca. 400 ms - the N400. Using an automatic ERP latency and amplitude estimator to facilitate the recognition of waves with a low signal-to-noise ratio, the objective of the present study was to study the N400 statistically in 24 volunteers. Stimulation consisted of 80 experimental sentences (40 congruous and 40 incongruous), generated in Brazilian Portuguese, involving two distinct local verb-argument combinations (nominal object and pronominal object series). For each volunteer, the EEG was simultaneously acquired at 20 derivations, topographically localized according to the 10-20 International System. A computerized routine for automatic N400-peak marking (based on the ascendant zero-cross of the first waveform derivative) was applied to the estimated individual ERP waveform for congruous and incongruous sentences in both series for all ERP topographic derivations. Peak-to-peak N400 amplitude was significantly augmented (P < 0.05; one-sided Wilcoxon signed-rank test) due to incongruence in derivations F3, T3, C3, Cz, T5, P3, Pz, and P4 for nominal object series and in P3, Pz and P4 for pronominal object series. The results also indicated high inter-individual variability in ERP waveforms, suggesting that the usual procedure of grand averaging might not be considered a generally adequate approach. Hence, signal processing statistical techniques should be applied in neurolinguistic ERP studies allowing waveform analysis with low signal-to-noise ratio.


Introduction
Long-latency evoked potentials usually occur 100 ms after a stimulus and behave according to the context of stimulation, hence being named event-related potentials (ERP).In records of the cortical activity related to language cognition, semantic violations coinciding with the merge operation between the verb and its complement have been related to the enlargement of a negative-going ERP peaking at around 400 ms post-stimulus (N 400 ) (1).This phenomenon has been widely described as a consequence of increased difficulty in morpho-syntactic integration due to the semantic anomaly (2,3).
Three parameters from the resulting averaged ERP waveform are usually analyzed in order to characterize the N 400 : latency, amplitude and topographic (spatial) distribution (4).The latency of the cortical response is referenced to a specific instant in time (for example, the target stimulus presentation) and usually ranges from 300 to 500 ms.Amplitude is commonly related to the level of facility to perform morpho-syntactic integration (5,6).Thus, it can also be seen as the inverse function of context: the lower the supporting context for semantic satisfaction, the larger the amplitude of the waveform as a direct result of the integration challenge (7).Amplitude is also inversely correlated with recency and priming effects.Topography is usually reported to be centro-parietally distributed in visually stimulated experiments and more diffuse in acoustic ones (8).
Nonetheless, most of the N 400 findings are chiefly related to the merge operation between a verb adjacent to a full noun phrase.Hence, characterizing the neurophysiology of subtly different types of merge already formally and clearly described in the linguistics literature on verbal complementation, such as the merge to a pronoun, is of great theoretical importance to neurolinguistics.
Since the protocols for eliciting N 400 in the language context are considerably extensive, only reduced numbers of target sentences can be presented to the subjectsusually 30 (3) or 40 (9) sentences, resulting in individual ERPs with low signal-to-noise ratios.Hence, the classic protocol recommends grand averaging the waveforms for all individual subjects as the best practice (2,9).Nevertheless, we noticed striking differences among ERP waveforms stemming from different individuals in the data collected for the França et al. (10) study.Based on this observation, in a previous study (11) we proposed an alternative to the standard grand averaging: an automatic ERP latency and amplitude estimator as a tool for helping the treatment of N 400 waveforms per individual, and applied it to the analysis of distinct local verb-complement merge conditions.
In the present study, we compare the statistical results obtained for local verbnominal object merge (for example, Joe ate sandals) with another local verb-complement merge where the complement receives its semantic properties from a nominal antecedent (local verb-pronominal object merge, for example, Joe bought sandals and ate them).Moreover, we compared the results obtained using the proposed analysis to those obtained by the usual grand-average technique.

Experimental protocol
A computerized grammaticality judgment test was presented to 24 right-handed volunteers (14 males).All volunteers were within the 18-39-age bracket (mean age, 26.3 years), had a complete or partial college education and had been previously screened for systemic diseases and for the current use of antidepressants.
Stimulation consisted of sentences involving the most studied type of verb-argument combinations, i.e., the local one.The verb finds its argument right beside it and incongruence is established by the local incompatibility between the selection requirements of the verb and the semantic properties of the complement.The latter can be semantically meaningful (nominal object series) or receive its semantic properties from a nominal antecedent (pronominal object series).
The selection sub-processes involved in nominal object series include the syntactic-categorical structuring of a phrase marker and the assignment of a conceptually felicitous or infelicitous local thematic role.In turn, the sub-processes involved in pronominal object series include: i) the syntacticcategorical structuring of a phrase marker; ii) then, since the pronoun is devoid of conceptual content, there is a search for an antecedent with which co-reference can be established; iii) finally, similarly to the nominal object series, a conceptually felicitous or infelicitous local thematic role is assigned.Sentences in Table 1 exemplify congruous and incongruous merges.
Volunteers were visually stimulated with 80 experimental sentences (40 congruous and 40 incongruous) and 80 distractor sentences (also, 40 congruous and 40 incongruous) for each series, all in Brazilian Portuguese, displayed in random order.The distractors were mixed with the experimental stimuli and were formulated so as not to present the same types of selection as those in the target stimuli.For instance, some distractors included intransitive verbs or subordinate clauses, despite having the same number of words as the experimental sentences.Subjects read test instructions on the computer screen followed by a warm-up drill that checked their comprehension of the protocol.After the warm-up, subjects could receive additional instructions from the experimenter if any doubts about the protocol persisted.When ready, subjects would start the grammaticality judgment test.
The stimulus sentences were presented kinetically, word-by-word, on the computer screen, commanded by a script written in Presentation 0.5 (Neurobehavioral Systems, Albany, NY, USA).Each word was centrally displayed on the monitor for 200 ms, formatted with white, 20-point, Times New Roman font over a black screen (800 x 600 resolution on a 17' monitor with the subject's eyes about 50 cm distant, resulting in average vertical and horizontal angles of 0.75°a nd 3.7°, respectively).The sentences had 5 and 8 words for the nominal and pronominal object series, respectively, and the last word was used as target for both series.After the presentation of the last word of each sentence, subjects were requested to judge the stimulus by pressing either the red or the green key on the keyboard for incongruence or congruence, respectively.This response was assessed so that attention from the subjects could be inferred.Waiting for a response wait would time-out after 1000 ms.Following the event of judgment or timeout, a white fixation cross was displayed for 2000 ms before the first word of the next sentence was presented.
The EEG signal was recorded continuously during the whole experimental session from 20 unipolar derivations.Silvertip electrodes were positioned according to the International 10-20 System, with averagedmastoid reference and ground at Fp z .Electrode impedance was controlled to normal values (for EEG, lower than 10 kΩ).The signal was amplified (gain = 18,000) and treated with low-pass (cut-off frequency of 32 Hz) and high-pass filtering (0.8 Hz).The latter removes all slow and constant electrical waves from the signal.All EEG derivations were then digitized with a sampling frequency of 200 Hz (12-bit analog-to-digital resolution) and were stored for off-line processing.

ERP waveform estimation
The original signal of each subject was segmented into epochs of 800-ms duration triggered by the onset of the target words.Then, an algorithm for artifact rejection was applied to each signal epoch.This algorithm consisted of a simple comparison to a threshold, defined as 1.35 times the root mean squared value of an artifact-free individual EEG raw signal.The epochs that presented any sample with a module above this threshold were discarded.
The ERP was then estimated by coherently averaging the epochs relative to congruous (or incongruous) EEG response for each electrode site of a subject.Hence, ERPs were time-locked to the onset of the stimulus-trigger for each condition, congruous and incongruous.Due to the analog highpass filtering, the use of a baseline preset before the stimulus could be avoided.Individual ERPs were then low-pass filtered (cutoff frequency of 7 Hz, 2nd order Butterworth, applied bi-directionally for obtaining a null phase frequency-response, i.e., no phase distortion).Figure 1 shows the resulting ERPs for one of the volunteers, illustrating the high inter-individual waveform variability, which makes the visual recognition of N 400 a difficult task. Figure 2 depicts the resulting grand-averaged waveforms regarding all volunteers.Thick and thin lines refer to incongruous and congruous sentences, respectively, and negative waves are plotted upwards in accordance with ERP literature.

Automatic N 400 marker and peak-to-peak amplitude estimation
A computerized routine for automatic N 400 -peak marking was applied to the resulting individual ERP for all derivations, as described in Figure 3. Negative peaks were recognized from the ascendant zero-cross of the first approximate waveform derivative within the 200-600 ms window.This could be carried out using simply the sign of the derivative.If multiple peaks were found within this time window, the most pronounced one was chosen.In cases of very poor waveforms a manual correction of the estimation was performed.Once the N 400 was identified, the amplitude between this wave and the highest positive peak around it was automatically calculated.Table 2 summarizes the mean and standard deviation (SD) of N 400 latency in the 20 ERP derivations for both series.It can be noticed that in most of them latency was slightly more delayed for incongruous (mean = 356.7 and 422.5 ms, for nominal and pronominal object series, respectively) than for congruous sentences (355.7 and 421.9 ms), although the mean latency for pronominal object series was nearly 20% more delayed than that for nominal object series.

Statistical test for the differential waveform
Due to the close values of N 400 latencies for congruous and incongruous sentences (Table 2), as well as to the high SD, no statistical comparison was performed for this measure.Moreover, the N 400 peak-to-peak amplitude could not be considered normally distributed within all volunteers according to the Anderson-Darling test for P < 0.05 In turn, a one-sided Wilcoxon signedrank test (paired non-parametric) was used to statistically assess the null hypothesis of equal median amplitudes of the congruent and incongruent ERPs (P < 0.05) (13).
Another statistical approach, the running t-test, was employed ( 14).This test is based on the difference between the individual ERPs of incongruous (I) and congruous (C) sentences for all volunteers.Using the differential waveform (I-C), normality could be assumed and the t-test was then applied on a sample-by-sample basis within the time interval of interest, from 200 to 600 ms.Two one-sided tests comparing (I-C) to zero, i.e., the null hypothesis of zero difference, were performed using a 2.5% level of significance for each side.At a given instant of time, if the null hypothesis was rejected for the positive side, one could infer that the resulting grand average for incongruous sentences would be statistically more positive or less negative then that for congruous sentences.The counterpart reasoning was applied to the instants of time for which the null hypothesis was rejected for the negative side.

Results
Table 3 shows the results (P values) of the Wilcoxon signed-rank test in both series for all derivations.Regarding nominal object series, only derivations F 3 , T 3 , C 3 , C z , T 5 , P 3 , P z , and P 4 had a significantly increased N 400 amplitude (P < 0.05) due to incongruence (bold in Table 3).The same applied to pronominal object series, but only for derivations P 3 , P z and P 4 .
The results of the running t-test for time intervals between 200 and 600 ms are shown in Table 4.All derivations that picked up augmented N 400 amplitude related to semantic incongruence based on the Wilcoxon signed-rank test showed a significant difference (I < C) within the time intervals of 315 and 475 ms for nominal object series and of 285 to 435 ms for pronominal object series (bold in Table 4).These intervals contain the respective mean N 400 latency for both series (bold in Table 2).

Discussion
Concerning the grand-averaged wave- forms (Figure 2), although the results for nominal object series reproduced those commonly described in the N 400 literature, i.e., the semantic anomaly increases wave amplitude, as reported by Osterhout et al. (3), an increase of the depth of the preceding valleys related to incongruous sentences was also noticed.This effect can be explained considering that the nominal series has the verb and object with all their semantic features right next to each other, i.e., all the elements for the verb-object integration are there.The verb has to either reject or accept the object "on the spot".Thus, this configuration imposes a greater integration urgency than if the verb needed to select a displaced object or needed to fetch semantic features of the object that were expressed by a nonlocal word, technically named antecedent, as in the pronominal object series.
Regarding the pronominal object series, since the target merge (second one) is dependent on semantic material already negotiated in the previous verb selection, its range of conceptual options is possibly narrowed.Thus, considering that semantic context is inversely related to amplitude, the perception of the congruous or incongruous material was facilitated, accounting for the lower amplitude of N 400 in comparison with the nominal object series.
The centroparietal topographic N 400 distribution achieved for nominal object series has also been reported by Kutas and Kluender (8) for visually stimulated experiments.Nevertheless, frontal and temporal derivations also resulted in significant changes between congruous and incongruous sentences.Furthermore, the statistics points out the leftmedial topography (odd and 'z' derivations) of N 400 changes due to incongruence, already visually noted by Friederici et al. (9).Furthermore, the derivations with least significant changes (accepting the null hypothesis of no increase) were frontopolar and occipital, a not surprising result since these areas are not involved in language processing.On the other hand, for pronominal object series, the significant increase in N 400 for incongruous sentences occurred bilaterally only in parietal derivations.This could be due to the need for semantic recall for the complement from short-term memory.
The running t-test showed that the more The deviation from the mean N 400 latency calculated from the individual ERP is given in parentheses.
pronounced negativity for incongruous sentences in the vicinity of 400 ms occurred in more derivations than expected (8).One possible interpretation for this finding is that the differential waveform can result in noticeable values if the original individual ERPs have different latencies, even having close amplitudes.If that does occur, the running ttest becomes less specific.On the other hand, only for nominal object series, one can infer that some derivations show a significantly higher positivity before and/or after the N 400 period, suggesting the presence of P 300 and P 500 .
In agreement with our previous study (11), the high variability of inter-individual waveforms, particularly concerning N 400 latency (SD of ca.50 ms for nominal object series and of 70 ms for pronominal object series) suggests that the waveform resulting from the common procedure of grand averaging (1,2,7) might not be considered to be the best representation of the set.This can be noticed by comparing Figure 1 to the grandaveraged ERP shown in Figure 2, where N 400 is widened and attenuated.Furthermore, from Table 5 it can be noted that the N 400 latency of the grand-averaged ERP differs considerably from the mean N 400 latency calculated from the individual waveforms, with deviations of up to 35 ms for nominal object series and 85 ms for pronominal object series.Although the individual ERPs showed a considerably low signal-to-noise ratio, mainly due to the reduced number of target stimuli, the automatic N 400 wave marker approach facilitated this kind of analysis.
The striking inter-individual variability suggests that the use of normalization, such as using the individual peak-to-peak N 400 amplitude and triggering the waveforms by the N 400 latency instead of by the stimulus onset, would be a more adequate ERP extraction procedure.In this case, when the grand-averaging procedure is applied, it would more accurately depict an averaged waveform and not the intervening effects of inter-individual neurophysiological differences.Based on this procedure, such variability could be reduced within a neighborhood of ± 200 ms wide, which includes the N 400 wave.
In the language context, the estimated grand-average ERP is built up from a reduced number of epochs for each subject and for a limited casuistry.Thus, the resulting waveform might not be considered to be the best representation of the set.On the other hand, the individual ERP waveform has low signal-to-noise ratio and does present considerably high inter-individual variability, which can affect both amplitude and latency.Despite this variability, the effect of incongruity was investigated by applying two distinct statistical tests.The Wilcoxon signed-rank test applied to the peak-to-peak N 400 amplitude provides more reliable results than the use of the running t-test with the differential waveform.Moreover, the use of an automatic N 400 wave marker facilitated the identification of individual N 400 waves.Based on all of these findings, one should point out the need of using signal processing statistical techniques in ERP studies on neurolinguistics allowing waveform analysis with low signal-to-noise ratio.

Figure 1 .
Figure 1.Example of resulting individual event-related potentials for all derivations in nominal (A) and pronominal (B) object series.Thick and thin lines refer to incongruous and congruous sentences, respectively.

Figure 2 .
Figure 2. Example of resulting grand-averaged event-related potentials for all derivations in: A, nominal and B, pronominal object series.Thick and thin lines refer to incongruous and congruous sentences respectively.

Figure 3 .
Figure3.Description of the automatic N 400 locating procedure: negative peaks are recognized from the ascendant zero-cross of the first approximate waveform derivative within the 200-600-ms window, which could be done by simply using the sign of the derivative.In case of multiple findings, the most pronounced negative peak is chosen.

Table 1 .
Examples of congruous and incongruous sentences for both series.
CongruousMinha professora leu um livro Maria vai limpar a cadeira e também guardá-la My teacher read a book Maria will clean the chair and also put it away A moça perdeu o anel Teresa está escrevendo um livro e tentará ilustrá-lo The lady lost her ring Thereza is writing a book and will try to illustrate it Incongruous Meu primo rasgou a geladeira Eu vou jogar a bola e então almoçá-la My cousin tore the fridge I will throw the ball and then lunch it A mulher cortou o vácuo Ela está misturando o suco e quer queimá-lo The woman cut the vacuum She is mixing the juice and (wants to) burn it Target words are presented in bold.

Table 2 .
Mean and standard deviation (in parentheses) of N 400 latency in all eventrelated potential (ERP) derivations (in ms).
Values in bold indicate the ERP topographic derivations where N 400 amplitude significantly increased (P < 0.05) due to incongruence (Wilcoxon signed-rank test).

Table 3 .
Results (P value) of the Wilcoxon signedrank test comparing congruous and incongruous N 400 peak-to-peak amplitude in both series.

Table 4 .
Time intervals where the waveforms for congruous (C) and incongruous (I) sentences differ (in ms).