Overtraining increases the strength of equivalence relations

The present study investigated whether overtraining of the conditional discriminations that are the prerequisites of equivalence class formation strengthens the relations among stimuli in an equivalence class. Two groups of college students formed equivalence classes that consisted of faces that expressed emotions (A) and arbitrary stimuli (B, C, D, and E). The overtraining group had twice as many training trials as the regular training group. For participants who formed equivalence classes, relational strength was evaluated by the generalization of expressed emotions from the A to the D stimuli, which was measured using a semantic differential. An untrained control group showed semantic differential scores that were positive for happy faces, negative for angry faces, and neutral for the D stimuli. For the experimental groups, the D stimuli, when included in equivalence classes, produced scores that were similar to those produced by the equivalent faces. The overtraining group, however, had average values closer to the values of the faces than the regular training group. These results indicate that the amount of training is an experimental parameter that influences the strength of relations between stimuli that are found to be equivalent in matching-tosample tests.


Introduction
Stimulus equivalence has been proposed as a behavioral model of semantic meaning (e.g., Sidman, 1986Sidman, , 1994;;Sidman & Tailby, 1982).Recent studies used different methodologies to demonstrate that equivalence relations have properties that are expected from genuine semantic relations (e.g., Barnes-Holmes et al., 2005;Bortoloti & de Rose, 2009, 2011a, 2012;O'Toole, Barnes-Holmes & Smyth, 2007;Haimson, Wilkinson, Rosenquist, Ouimet, & McIlvane, 2009).For example, Bortoloti and de Rose (2009) showed that stimuli originally classified as "meaningless" tend to be classified differently when they are involved in equivalence classes that include meaningful stimuli.Bortoloti and de Rose conducted conditional discrimination training to generate equivalence classes that consisted of pictures of faces that expressed emotions (meaningful stimuli) and arbitrary forms often used in experiments with the stimulus equivalence paradigm (presumably meaningless stimuli).After establishing equivalence relations, some arbitrary forms were evaluated for generalization of "expressed emotion" with a set of semantic differential scales.The semantic differential is an instrument that measures the semantic meaning of "concepts," such as words, phrases, pictures, and drawings, and allows the assessment of the semantic proximity between the evaluated concepts (Osgood, Suci, & Tannenbaum, 1957).Bortoloti and de Rose demonstrated that the meaning of the faces, evaluated through the semantic differential, was imparted to the arbitrary stimuli that were included in the same equivalence class.This result was discussed as evidence that equivalent stimuli can share similar meanings, which provides an external validation of stimulus equivalence as a model of semantic relations.
The semantic differential can be used to determine if "semantic similarities" between equivalent stimuli differ as a function of the experimental parameters employed.Bortoloti and de Rose (2009) showed that the similarity between evaluations of the arbitrary stimuli and faces equivalent to them decreased as the nodal number (also known as nodal distance) increased, apparently supporting previous findings by Fields and colleagues (e.g., Fields, Adams, Verhave, & Newman, 1993;Fields, Landon-Jimenez, Buffington, & Adams, 1995;Moss-Lourenco & Fields, 2011).Bortoloti and de Rose also compared groups that formed classes after training with simultaneous matching with groups that formed classes after training with 2-s delayed matching.They found that the participants in the groups trained with delayed matching evaluated the arbitrary stimuli as more similar to the equivalent faces than the participants in the groups trained with simultaneous matching.Bortoloti and de Rose concluded that this higher level of correspondence between evaluations was likely attributable to a strengthening of the equivalence relations determined by the delayed matching procedure.
The present study also used the semantic differential to assess the extent to which the evaluations of arbitrary forms become similar to the evaluations of pictures of faces that express emotions when those stimuli are members of the same equivalence classes.Similarity was estimated from deviations between evaluations of the arbitrary stimuli and the faces, so that levels of similarity are inversely proportional to the deviations.We assume that the level of similarity can be considered a quantitative measure of the relational strength between stimuli belonging to the same class.
The experimental parameter assessed in the present study was the amount of training.We sought to determine whether a doubling in the number of baseline training trials would have a marked effect on semantic differential ratings.Two experimental groups of undergraduates received conditional discrimination training to generate three equivalence classes that consisted of pictures of faces that expressed emotions and abstract forms.The relations taught to all of the participants were the same, but one group received twice as many training trials than the other group.After the establishment of equivalence classes, both groups evaluated some of the abstract stimuli through the semantic differential and these evaluations were compared to evaluations of the faces made by a control group.

Participants
Participants were 34 undergraduates majoring in the humanities, biological sciences, or arts, divided into two experimental groups: regular training group (n = 17) and overtraining group (n = 17).Their native language was Brazilian Portuguese, and they were not familiar with stimulus equivalence or related phenomena and concepts.Data from the control group (n = 25) in the study by Bortoloti and de Rose (2009) were also used in the present study.

Equipment, setting, and stimuli
An Apple Macintosh G4 computer presented stimuli and recorded responses using the MTS software (version 10.32;Dube & Hiris, 1997).Each trial displayed five white windows (6 cm × 6 cm) on a gray screen, one at the center and one near each of the monitor's corners.Participants responded by moving the computer's mouse to position a cursor on a window and then clicking the mouse's button.
The sessions were conducted in a 2 m × 3 m laboratory room and were approximately 30-50 min long.
Participants in the experimental groups also completed semantic differential scales in this room.The control group from Bortoloti and de Rose (2009) completed semantic differential scales in their classroom.
Figure 1 presents the stimuli used in the experiment.Set A comprised 12 pictures, including four angry faces (A1), four neutral faces (A2), and four happy faces (A3).Sets B, C, D, and E comprised three abstract forms each.The pictures were extracted from the Pictures of Facial Affect CD-ROM, purchased from Paul Ekman's website (https://www.paulekman.com/).Several pictures of human faces that depicted expressions of happiness, anger, disgust, fear, surprise, and sadness were recorded on this CD-ROM.The pictures selected for this study were judged to be expressions of happiness and anger by 100% of the judges who evaluated the faces.

Procedure
Phase1: Establishment of equivalence classes.Participants in both experimental groups were taught the same set of conditional relations with the same stimuli through a simultaneous matching-to-sample procedure.The experimental parameter manipulated as the critical difference between these two groups was the number of training trials.For participants in the overtraining group, the number of trials was twice the number of the regular training group (Table 1).To ensure that the overtraining group would have exactly twice as many trials as the regular training group, a noncorrection procedure was used, in which the trials were not repeated when participants made errors.The participants, therefore, had only the prearranged number of trials to learn the stimulus relations that were presented.
Each matching-to-sample trial began with the presentation of the sample stimulus in the central window.A click on this window produced a set of three comparison stimuli, one in each of three of the peripheral windows.The other peripheral window remained blank, and the sample remained in the central window.A click on the window that contained the stimulus designated as correct produced a sequence of tones and a display of stars that moved on the computer screen.Incorrect responses blackened the screen for 3 s.The feedback for a correct or incorrect response ended the trial, and a new trial began after a 2-s intertrial interval.
The first block of trials taught conditional discrimination AB.. Sample A1 could be any one of the angry faces, according to a randomized sequence.In a similar fashion, sample A2 could be any one of the neutral faces, and sample A3 could be any one of the happy faces.The positions of the comparison stimuli were determined according to a randomized sequence.In each of the first 12 trials of this block, a written prompt was presented on the screen.The Portuguese equivalent of the phrase "When this is here" was presented above the sample, and the Portuguese equivalent of "Pick this" was presented above the correct comparison.These 12 prompted trials were followed by 24 trials without prompts for the participants in the regular training group and 48 trials without prompts for the participants in the overtraining group.A similar procedure was used to teach the AC, CD, and DE relations.Each of these trial blocks (AB, AC, CD, and DE), therefore, had 12 prompted trials followed by 24 unprompted trials for the regular training group and 48 unprompted trials for the overtraining group.Each of these blocks was presented only once, regardless of the participant's performance.
The next block mixed 12 trials of each conditional relation (AB, AC, CD, and DE) for the regular training group and 24 trials of each conditional relation for the overtraining group, thus comprising 48 trials for the former and 96 trials for the latter, in a randomized sequence.Regardless of participants' performance in this block, the Portuguese equivalent of the message "The computer will no longer signal if your choices are correct or wrong" was then displayed on the screen, and the mixed block was repeated without differential consequences for correct and incorrect responses.Table 1 presents the number of trials per block for the two experimental groups.Two blocks of 24 probe trials without differential consequences tested equivalence class formation.The first block evaluated the emergence of the BE-derived relation.The second probe block tested emergent conditional discrimination EB.
The probe blocks designed for this study can be considered as combined tests of equivalence: The emergence of BE and EB implied logically that all symmetrical and transitive relations necessary to demonstrate equivalence between stimuli A, B, C, D, and E were established.In this arrangement, equivalence classes could be tested without the presentation of the faces and the arbitrary stimuli that would be evaluated through the semantic differential (A and D stimuli).
Participants proceeded through training and equivalence tests, according to this programmed sequence, without mastery criteria.The next phase, however, was conducted only with participants who made no more than two errors in each probe block: 11 in the regular training group and 10 in the overtraining group.These participants met the criterion to conclude that they formed three equivalence classes.The other participants ended their participation at this point.
Phase 2: Evaluation of the stimuli through the semantic differential.Participants who met the equivalence criterion were instructed to evaluate the abstract stimuli D1, D2, and D3 through the semantic differential.Each scale comprised seven intervals and was anchored by "polar terms" (i.e., a pair of opposite adjectives).The set of scales was printed on an A4 sheet of paper that also depicted one of the "D" stimuli as represented in Figure 2. displayed stimulus D2, equivalent to the neutral expression.The other sheets displayed D1 (equivalent to the angry expression) and D3 (equivalent to the happy expression) in an order that varied among participants.
The intervals in all of the scales received a value that varied from -3 to +3.The value -3 was assigned to the position closest to the adjective regarded as negative, and the value +3 was assigned to the position closest to the adjective regarded as positive.To aid presentation, Figure 2 shows the adjectives considered negative on the left and the ones considered positive on the right, and the respective values are printed below the scales.The values were not printed on the sheets of paper that were given to the participants, and the position of the adjectives was randomized.

Control group
Data from the control group from Bortoloti and de Rose (2009) were used in the present study.This group consisted of 25 undergraduates who evaluated the stimuli from set D and all of the pictures of faces that expressed emotions.The control group did not undergo conditional discrimination training.

Results
Of the 17 participants in each group, 11 in the regular training group and 10 in the overtraining group exhibited the emergence of the BE and EB relations, which were indicative of the formation of equivalence classes comprising faces and arbitrary stimuli.For these 21 participants, the semantic differential was used to evaluate the D stimuli in each equivalence class.Figure 3 shows the medians of the evaluations of the stimuli that were equivalent to the happy and angry faces by each experimental group.The medians of the evaluations of happy and angry faces by the control group are also presented in Figure 3.
The median values of the evaluations of angry and happy faces by the control group were different across the scales.The happy faces received median positive evaluations in 11 scales and neutral evaluations in only two of the scales of the semantic differential instrument (poor/rich and submissive/dominant).The highest positive evaluations of the happy faces were observed in the scales anchored by the polar terms sad/happy, negative/positive, and unpleasant/pleasant.The angry faces received median negative evaluations in eight scales (tense/relaxed, rough/smooth, ugly/beautiful, negative/positive, hard/soft, bad/good, unpleasant/ pleasant), neutral evaluations in two scales (sad/happy, poor/rich), and positive evaluations in three scales (slow/fast, passive/active, submissive/dominant).The evaluation of the D stimuli by the control group (not plotted) did not deviate much from neutrality.In contrast, the median evaluations for the D stimuli by the experimental groups tended to approximate the median values for evaluations of the equivalent faces.Visual inspection of Figure 3 reveals that the evaluations of the D stimuli by the overtraining group tended to be closer to the evaluations of the faces than the evaluations of the D stimuli by the simultaneous group.Deviation scores based on the absolute values of differences between the evaluation of faces and the evaluation of stimuli equivalent to them were calculated for each of the 13 scales of the semantic differential used in this study.Thus, if the median evaluation of the happy faces in a scale of the semantic differential was identical to the median evaluation of the D stimulus equivalent to the happy faces, then the deviation on that scale was 0. If the evaluations differed, then one value was subtracted from the other, and the absolute value of the difference constituted the deviation on that scale.Therefore, larger deviations would be associated with less similarity between evaluations of the faces (the A stimuli) and the stimuli that were equivalent to them (i.e., the D stimuli).
Figure 4 shows the interquartile range of the deviation scores for evaluations of the abstract forms and happy and angry faces in the regular training and overtraining groups.Semantic differential evaluations were less deviant when overtraining was used than when regular training was used (t 25 = 1.80, p < .05).
Differences in evaluations for different emotional expressions were also compared.Figure 5 presents the inter-quartile range of the deviation scores for evaluations of stimuli equivalent to the happy and angry faces combined across both experimental groups.Lower deviation scores were found for the stimulus that was equivalent to the happy faces than for the stimulus that was equivalent to the angry faces (t 25 = 2.15, p < .05).

Discussion
This study investigated whether overtraining of baseline relations could influence the strength of equivalence relations.Two groups of participants were trained to establish equivalence classes comprising pictures of faces expressing emotions (A) and abstract forms (B, C, D, and E).One group received regular training and the other received overtraining of the same baseline relations AB, AC, CD, and DE.Then, all participants were submitted to BE and EB equivalence probes, to assess class formation.Regular training and overtraining yielded equivalence classes with similar likelihood.Participants who formed classes used a semantic differential to evaluate the D stimuli and these evaluations were compared to evaluations of the faces (the A stimuli) made by a control group of participants.The level of similarity between the evaluations of the faces and the evaluations of the abstract stimuli was taken as a quantitative measure of the relational strength between stimuli that were members of the same equivalence class.This measure was sensitive to the amount of training: Evaluations were more similar for the classes that had overtraining of the baseline relations than for those that had a regular level of training.Interestingly, the similarity of the evaluations was greater for the D stimuli equivalent to the happy faces than for the D stimuli equivalent to the angry faces.Some factors that might account for these outcomes are presented below.
Establishment of equivalence classes.Different amount of baseline trials produced approximately the same yield of equivalence classes: 65% of the participants who received regular training (11 out of 17) and 59% of those who received overtraining (10 out of 17) demonstrated consistent performances in BE and EB probes.The yield of both groups is high compared to studies that used only arbitrary stimuli in a simultaneous training protocol (see Fields, Arntzen, Nartey, & Eilefsen, 2012).The inclusion of meaningful stimuli in each of the programed equivalence classes may have contributed for the yield achieved in the current study.Participants were taught to establish three 5-member equivalence classes including both meaningful and arbitrary stimuli.The meaningful stimuli were members of perceptual classes (Fields & Moss, 2008): The stimuli designated as A1, A2, and A3 were not individual stimuli; rather, each comprised four pictures of faces, with each face belonging to a different person.The common feature of the faces in each category was the emotional expression, which was an angry expression in A1, a neutral expression in A2, and a happy expression in A3.Perceptual classes were used to ensure that abstract stimuli would be equivalent to a particular emotional expression and not to idiosyncratic features of a particular face.Thus, the three 5-member equivalence classes included both perceptually related meaningful stimuli and arbitrarily related stimuli.Equivalence classes that include meaningful stimuli share more features of natural language classes than classes comprising only arbitrary stimuli.Previous studies from our laboratory also showed high yield when pictorial stimuli were included in the matching to sample training (e.g., Bortoloti & de Rose, 2007, 2009, 2012).Fields et al. (2012) noted the yield improvement by inclusion of meaningful stimuli in the simultaneous protocol and described systematically this effect: The authors showed that yield of equivalence classes increases from 20% to 80% when training includes meaningful and arbitrary stimuli, in comparison with training only with arbitrary stimuli.Data from the current study are consistent with those previous findings.
Graded generalization of Semantic Differential evaluations.Different levels of similarities found in the semantic differential scores can be related to assumptions introduced by Fields and colleagues in the equivalence literature (e.g., Fields et al., 1995;Moss-Lourenco & Fields, 2011).Fields and colleagues claimed that stimuli in equivalence classes may vary in their relatedness.Studies by Bortoloti andde Rose (2009, 2012) appear to support the assumptions of Fields et al., who reported evidence that scores in different postclass formation tests may vary as a function of several procedural parameters.The notion of varying degrees of relatedness, however, seems incompatible with the mathematical notion of equivalence.Both Bortoloti and de Rose (2011b) and Doran and Fields (2012), noted that the contradiction between equivalence and degrees of relatedness reminds of the famous remark in George Orwell's book Animal Farm, that all animals are equal but some are "more equal" than others.The irony in Orwell's remark implies satire about a society that admitted graded equality, which is the very absence of equality.The same would apply to equivalence relations between stimuli, which should not be subject to degrees of equivalence.
Is it possible to reconcile equivalence relations with graded transfer of meaning observed in the semantic differential evaluations?This would be potentially conceivable if we resorted to the early equivalence literature.In the early equivalence literature, stimulus equivalence was sometimes distinguished from functional equivalence.Stimulus equivalence was defined by the properties of transitivity, symmetry, and reflexivity.Functional equivalence was defined by the transfer of functions.A few studies suggested lessthan-perfect congruence between stimulus equivalence and functional equivalence (e.g., de Rose, McIlvane, Dube, & Stoddard, 1988;Sidman, Wynne, Maguire, & Barnes, 1989).This led investigators to raise the possibility that these might be closely related but not identical phenomena.Sidman (1994), however, strongly argued against this conclusion.Since then, functional equivalence as a possibly distinct phenomenon disappeared from the literature.Nevertheless, studies have continued to show a graded transfer of functions within equivalence classes (e.g., Bortoloti & de Rose, 2009, 2012;Moss-Lourenco & Fields, 2011).Sidman (1994) argued against the concept of functional equivalence and pointed to the mathematical contradiction of degrees of equivalence.However, it appears that sometimes he acknowledged that the relational strength among stimuli may vary.Sidman (2000) observed, for instance, that some experimental parameters could affect the strength of baseline and emergent conditional discriminations, which necessarily implies some impact on the relational strength of equivalent stimuli: "For example, we might make Comparison B2, or some undefined stimuli, very similar to B1; or those other stimuli may be more attractive to the subject than B1 is; or some undefined response may be much easier than Response 1 is for the subject; or some undefined consequence may be a more effective reinforcer than what we have defined as the reinforcer.Such possibilities will weaken the AB conditional discrimination and any relation we might expect to be derived from it."(Sidman, 2000;p. 131, emphasis added) Recently, Doran and Fields (2012) dealt with this apparent contradiction.The authors argued that differential relatedness among equivalent stimuli does not imply that these stimuli cannot be functionally interchangeable, as required by the equivalence paradigm.According to their view, even if all stimuli within an equivalence class are differentially related, they are still more closely related to each other than to stimuli from other classes.Whether the same set of equivalent stimuli will act as fully interchangeable or not, will depend on the demands of the task.Thus, for instance, when the task requires cross-class discrimination (e.g., a typical matching-to-sample equivalence test) the equivalent stimuli will be fully substitutable for each other.However, if the task does not demand cross-class discrimination (e.g., a matching-to-sample test design to measure relational preferences among equivalent stimuli), then the stimuli will not be fully interchangeable and the differential relatedness between them will play an important role on the participant's performance.Both between-class and within-class stimulus control relations could coexist but each one would be singly evoked as a function of the task requirement.
Applying this reasoning to the current study, during the matching-to-sample trials, cross class discriminative contingences prevailed and controlled the participants' performance, which expressed discrimination between classes.Then, the semantic differential was used to evaluate within class relatedness.The cross-class stimulus control topographies were nonexistent in this phase and the differential relatedness determined by the amount of training and (possibly) by the valence of the emotional stimuli prevailed and controlled participants' evaluations.
Valence effects on deviation scores.Interestingly, a difference in evaluations for different emotional expressions was found.Evaluations with the semantic differential showed lower deviation scores for stimuli that were equivalent to the happy faces than for stimuli that were equivalent to the angry faces.This result is consistent with results reported by Bortoloti and de Rose (2012) and a reanalysis of the data by Bortoloti and de Rose (2009).Bortoloti and de Rose (2011) reanalyzed data from Experiment 2 of Bortoloti and de Rose (2009), showing the mean deviation between evaluations of the abstract stimuli and evaluations of the faces that were equivalent to them.This reanalysis showed lower deviation scores for stimuli that were equivalent to the happy faces than for stimuli that were equivalent to the angry faces.In three of our studies, arbitrary stimuli that were equivalent to happy faces appeared to be more strongly related to the faces than arbitrary stimuli that were equivalent to angry faces.Currently, these data appear to be consistent with studies that described faster and more intense responses to happy faces (Batty & Taylor, 2003;Kirita & Endo, 1995;Leppänen, Kauppinen, Peltola, & Hietanen, 2007) but inconsistent with studies that described faster responses to angry expressions (Fox, Lester, Russo, Bowles, Pichler, & Dutton, 2000;Hansen & Hansen, 1988;Öhman, Lundqvist, & Esteves, 2001).Further investigations of the effects of emotional stimuli on the strength of symbolic relations simulated according to the equivalence paradigm are necessary.If the differences reported herein prove to be genuine, however, another avenue for investigation involving stimulus equivalence will be opened.
Overtraining effects.How would overtraining contribute to reinforce the relational strength of equivalent stimuli revealed by the semantic differential?This is not entirely clear, but probably exposure to more training trials may sharpen both discriminations required to perform matching-to-sample trials.Accurate performance in matching-to-sample trials requires successive discriminations among samples and simultaneous discriminations among comparison stimuli.With regard to the successive discriminations among samples, exposure to a number of trials is necessary to discriminate that there are sets of stimuli with common features representing more abstract emotional expressions.Exposure to more trials should foster acquisition of the specific discriminations involved and formation of the broader categories.In addition, overtraining might improve the simultaneous discriminations among the arbitrary comparison stimuli.Overall, overtraining might refine the stimulus control involved in establishing relations between the stimuli in each class.It is possible that these refinements account for the results obtained, so that sharpening the discriminations among the experimental stimuli with overtraining produced the effects observed on the semantic differential ratings.That is, equivalence classes were strengthened by sharpening discriminations among the arbitrary stimuli and the sets of faces, and this reduced variation in ratings of these stimuli.
The experimental parameters that impact the relational strength of equivalent stimuli constitute an important issue that has not been wholly investigated.Fields and colleagues showed that some structural variables like nodal distance (Fields et al., 1995, Moss-Lourenco & Fields, 2011; see also Bortoloti & de Rose, 2009) and number of logical properties involved in the relation (Doran & Fields, 2012) may have impact on the relatedness of equivalent stimuli.Bortoloti and de Rose revealed that some non-structural variables like matching delay (Bortoloti & de Rose, 2009, 2012) may also influence the relational strength of equivalent stimuli.The present study showed the influence of two other non-structural parameters: overtraining of baseline relations and the valence of emotional stimuli involved in the equivalence classes.Besides the theoretical relevance of understanding the precise circumstances that affect the strength of equivalence relations, this knowledge has also practical implications.The notion of stimulus equivalence has provided operational criteria for attributing symbolic functions to stimuli and generated a powerful technology for education and rehabilitation (e.g., Almeida-Verdu et al., 2008;Cowley, Green, & Braunling-McMorrow, 1992;Fienup, Covey, & Critchfield, 2010;Lynch & Cuvo, 1995;Melchiori, de Souza, & de Rose, 2000;Rehfeldt, 2011;Rehfeldt & Barnes-Holmes, 2009).The demonstration that some experimental parameters influence the strength of the relations can be important for technological applications based on stimulus equivalence.In such applications, maximizing the strength of symbolic relations will often be desirable.Therefore, knowledge about the parameters that affect this strength is potentially very important for technological applications of equivalence.

Figure 1 .
Figure 1.Stimuli used in the experiment and schematic representation of the trained relations.

Figure 2 .
Figure 2. Example of a "D" stimulus above the set of bipolar scales.The participants received four sheets of paper.The first sheet contained instructions to fill in the scales.Each of the three other sheets displayed stimulus D1, D2, or D3 above the set of bipolar scales.For all of the participants, the sheet immediately after the instructions

Figure 3 .
Figure 3. Medians of the evaluations of faces by the control group and medians of evaluations of "D" stimuli by the experimental groups.

Figure 4 .
Figure 4. Interquartile range of the deviation scores between evaluations of the abstract stimuli and evaluations of the faces that were equivalent to them.

Figure 5 .
Figure 5. Interquartile range of the deviation scores for evaluations of stimuli equivalent to the happy and angry faces combined across the regular and the overtraining groups.

Table 1 .
Number of trials of each type presented to the two experimental groups.