do Test of Pragmatic Language 2 – TOPL-2 Psychometric equivalence of the Brazilian version of the Test of Pragmatic Language 2 – TOPL-2

Apesar das limitações ABSTRACT Introduction: Considering the relevance of the investigation about the pragmatic function for language disorders diagnosis, as well as the relevance of validity evidence for test interpretation, this paper aimed to explore the psychometric equivalence between the American and Brazilian versions of the Test of Pragmatic Language, Second Edition (TOPL-2). Methods: A total of 81 students from the third to seventh year from public elementary schools of São Paulo (63% girls; average age=9.42 years, SD=0.93) with no complaints or indicators of low school performance were selected. Students answered the verbal application of the TOPL-2 Brazilian-Portuguese version. Some of the psychometric inquiries reported in the original version of the test were reproduced focusing on the analyses of difficulty, internal consistency, discrimination, and differential item functioning. Results: Most of the items (86%) presented similar difficulties to the American version. The internal consistency index had acceptable values (0.70). Few items (26%) showed adequate discrimination, but 49% were close to the desirable cutoffs. Eight items showed differential functioning; of which, five were favorable to boys. Conclusion: Despite sample limitations related to size and variability, almost half of the items showed psychometric equivalence to the American version.


INTRODUCTION
In the field of human communication disorders, especially in speech language pathology and audiology, there is a recurring concern of researchers to identify relationships between the processes of the various subsystems of language.The analysis of the speech act does not dissociate from the language use characteristics, which implies recognizing that the pragmatic function of the language lies in regulatory acts of communication and, to some extent, even determines them (1) .
Systematically, studies of linguistics postulate that the pragmatic bases are critical to the effectiveness of human communication, mediated by both oral and written language, in their expressive and understanding modes (2)(3)(4)(5) .In evaluations of human communication, especially speech, one must consider, therefore, that in addition to the basic mechanisms for the expression and understanding of oral or written language, other ones related to the pragmatic function of language are essential.In light of this piece of information, the assessment of pragmatic function must comprehend the abilities to make inferences, self-monitoring, and critical skills, which are essential for the discursive activities related to reading comprehension to be proficient.
The lack of national instruments for the assessment of different functions or language processing, in addition to the importance of generalizing results in different samples, explains the interest of Brazilian researchers in studies aimed to adapt internationally established language tools to our reality (6)(7) .However, for a test built in a given culture to be used in a different reality, with assured quality, the researcher, in addition to being concerned with traditional procedures of translation and adaptation to establish semantic and cultural equivalence of the items, should also stick to the equivalence of psychometric aspects between the original and the new version (8) .
Admitting the importance of pragmatic language in the regulation of self-monitoring mechanisms for understanding and preparation of speech determined the construction of a research project, which has investigated the relationship between the pragmatic function of language and reading comprehension and writing performances (1,(9)(10) .To do so, we analyzed some tools for the assessment of pragmatic function.The Test of Pragmatic Language (TOPL) (11) is an American test currently in its second edition (TOPL-2) (12) .Both versions are aimed to evaluate the pragmatic or social function of language.TOPL-2 was translated and adapted to Brazilian Portuguese (BP) (13) .Its 43 items of oral application provide important information about the ability of solving conflicts and social skills.The test, in North and South-American versions, was proven useful to assess the pragmatic function of language in oral communication.Studies have shown its effectiveness in identifying problems in specific populations, including people with autism, Williams syndrome, and specific language disorder (14)(15)(16) .
The process of translation and adaptation of TOPL-2 for BP followed the recommendations by Beaton et al. (17) .The test was translated by a certified translator and was applied to a group of Brazilian elementary school students (18) .Answers collected in this procedure were analyzed, and those with a greater number of errors were reanalyzed by three Brazilian speech therapists, who were fluent in English and also reviewed the test propositions.For proofreading and grammatical and idiomatic equivalence, the test was sent to another English translator who did not know the original, so the reverse process could be performed, that is, the new version was translated into English again and compared to the original, being confirmed its fidelity.
Given the importance of pragmatic function studies for the diagnosis of language disorders, in both oral and written communication, and the need to use language assessment tests with psychometric validity, the objective of this study was to collect evidence of the psychometric equivalence between the American and Brazilian versions of TOPL-2.
Thus, after the procedures of translation and linguistic adaptation, TOPL-2 was applied again to another sample of typical students to replicate the main statistical analyzes reported in the tool's manual, referring to the properties of items (reliability, difficulty, discrimination, and differential functioning).Psychometric analysis of the properties of items allows us to identify the ones that should be reviewed for implementation in a further normative study.The hypothesis of this pilot study was that most of the test items presented psychometric similarity with the original version (American) of the instrument.

METHOD Ethical considerations
This study was approved by the Research Ethics Committee of Escola Paulista de Medicina, Universidade Federal de São Paulo (EPM-UNIFESP, protocol 1.731/08).

Sample
The sample comprised 81 children (63% girls), aged 8 to 11 years (mean=9.42years, SD=0.93), enrolled between the third and seventh grade of basic education in two schools from the public school system in São Paulo.Participants were at first nominated by their teachers, who had been instructed to select students without complaints or indicators of reading deficits, or poor academic performance.Students were then filtered, and only those who achieved expected values of reading rate and accuracy for their educational level, according to criteria presented by Carvalho (9) , comprised the sample.Parents and caregivers signed the informed consent form.

Instruments
Participants answered the version translated and adapted to BP (13) of TOPL-2 (12) , an oral application test consisting of CoDAS 2015;27(4):344-9 questions prepared by the examiner, which are mostly based on figures from the original.These are aimed to provide information on six subcomponents of the pragmatic function of language: • Physical context -providing environmental cues that signal appropriate communication standards (e.g., talking quietly in a library); • Audience -ability to monitor a variety of factors related to the interlocutor, adapting the way to communicate to them (e.g., the age or number of speakers); • Topic -ability to manage content that is appropriate to the topic, ensure logical consistency to the flow of conversation, and monitor the progress of the subject matter to solve problems that hinder understanding; • Purpose -gathering characteristics related to the purpose of a conversation, as well as changes and linguistic manipulations used to achieve it (e.g., asking questions); • Visual cues -attention to nonverbal aspects of communication (e.g., body language); • Abstraction -perception of information reported by abstract language, usually used to communicate emotions, images, and other messages that cannot or should not be directly transmitted (e.g., the saying "Better safe than sorry" sets a message of prudence).
The functions described may be evaluated in one or different items.Item 2 assesses the subcomponents "Topic" and "Purpose," whereas item 3 analyzes only "Abstractions." In addition to the pragmatic function of language, participants were assessed as to reading decoding, grammatical closure, listening and reading memory, and comprehension, whose results are not reported in this article.

Procedures
Students were evaluated by a single examiner in rooms reserved by the school administrators.Applications of TOPL-2 were individual and lasted 40 minutes on average.

Data analysis
The analysis showed, in the pilot sample, some of the studies conducted for the American standardization of TOPL-2.The intention was to investigate the psychometric comparability of cultural and linguistic adaptations of the instrument to the original version.
Thus, classical analyses were used, and inclusion criteria were defined by the authors of the test: internal consistency (Cronbach's alpha) -values above 0.70 were considered adequate; difficulty (proportion of correct answers in items)desirable median values (0.15-0.85); discrimination (pointbiserial correlation) -from 0.35 on; and differential item functioning (DIF) -linear logistic regression, with total score (Model 1) or total score plus gender (Model 2) as predictors of accuracy in items.
When there was variation between models, the DIF was considered, given by the difference between the variance explained by the models (R 2 ), with the following criteria (12) : (d.1) insignificant DIF -difference between R 2 less than 0.035; (d.2) moderate DIF -difference greater than or equal to 0.035 and less than 0.070; and (d.3) high DIF -difference greater than or equal to 0.070.
In addition to statistics reported in the original study, we also calculated index "D" of discrimination, which reflects the difference of proportion of correct answers by individuals with the scores 27% higher and 27% lower.D values below 0.28 were considered inadequate (19) .
Data were analyzed using SPSS software for Windows, version 18.0.All comparisons considered p<0.05 significant.

RESULTS
Table 1 presents the descriptive analysis of the study.The overall mean score was over 50% (mean=22.54,SD=5.16).Although asymmetry and kurtosis values were near zero (0.39 and -0.21, respectively), there was no normal distribution of items (Kolmogorov-Smirnov=0.131,df=81, p<0.001).Similar to the original study, correlation was observed between age and participants' scores.But unlike in the original, which used the Pearson correlation, we used Spearman correlation due to the nonnormal characteristic of data.However, in this study, correlation was low and not moderate as in the American sample.
On the analysis of internal consistency, TOPL-2 items had the value of 0.71, which is within the desirable limit (20) .In the age groups of younger children (8 and 9 years), there was a trend to stronger alpha values (unlike the original study, where they remained high and more or less stable).Figure 1 shows the distribution of frequency of TOPL-2 items in the Brazilian sample.
Table 2 shows the values of psychometric index of the study.As for difficulty (column p), only 6 items (14%) had very low rates (1 item) or very high (5 items) accuracy.In Cronbach's alpha statistics, the values have not changed with the analyses conducted with removed items and always ranged between 0.68 and 0.72 (column α in Table 2).Only 11 items (26.0%) had point-biserial correlation within the cutoff set by the American study (>0.35).However, 16 other items (37.2%) showed significant correlations, ranging from 0.23 to 0.33 (column r pb ).As to discrimination (column D), 21 items (48.8%) had adequate values (≥0.28).Only three of these had no point-biserial correlation.
Items simultaneously presenting point-biserial correlation, median values of difficulty, and index D greater than 0.28 were considered psychometrically adapted.Of the 43 items in TOPL-2, only 19 (44.2%) had adequate statistics for all measures used (marked as "A" in column Equivalence, Table 2).Of these 19 items, 7 (36.8%)had the highest difficulty (0.15≤p≤0.40).Items with intermediate difficulty and significant point-biserial correlation or D index above 0.28 (classified as B in Table 2) were defined as partially equivalent, totaling 10. Items with too high or too low precision, very low D index and/or nonsignificant point-biserial correlation (items marked as "C" in column Equivalence, Table 2) are not equivalent, totaling 14.These need further and more significant revisions.On the basis of these results, more than half of the items (67.44%) had the expected psychometric quality, either total or partial.
In regard to DIF, as indicated in the method, two different logistic regression models were compared.The score in items was used as outcome, and the total score of each participant (Model 1) and total plus gender (Model 2) as predictors.Table 1 shows the overall average of correct responses by gender and the correlation between gender and total score.Table 3 shows the mean and standard deviation of correct answers per item according to participants' gender.Most items (81.4%) had insignificant DIF (mean=0.01).Three items (5, 9, and 18) had moderate DIF (mean=0.05) and five of them (13, 23, 29, 39, and 43)  presented high DIF (mean=0.09).Most items with high DIF values (60%) were positive for boys, whereas those with moderate DIF tended for girls (66.7%).Among DIF, four cases were considered psychometrically appropriate (13,  23, 39, and 43).In the original version, there was no DIF by gender, only one item with moderate DIF, and favorable to African-Americans to the detriment of Americans of European origin.Table 3 summarizes these results.

DISCUSSION
This study reproduced, in a Brazilian sample, the main psychometric investigations performed in the American version of TOPL-2.Overall, the results are promising with regard to the psychometric equivalence of the versions, but point out the need to adjust some items to ensure the overall quality of the adapted version.
In general, Cronbach's alpha values found in this study were satisfactory, though a little below those presented in the original version.The sample size of this study, which is lower than the American version, which consisted of 1,136 people, could explain this result.It is known that the coefficient alpha is sensitive to the sample size as well as to items (21) .This hypothesis is supported when there is difference between alpha values for the sample of females (n=63) and children (n=35), respectively, 0.75 and 0.58 (Table 1).With a larger sample in the standardization study, results could be more similar to the English version.
Regarding the index of difficulty, for this parameter, the items showed greater similarity with the American sample.Thus, 86% of the items obtained values within the range stipulated by the authors (between 0.15 and 0.85), as of the manual data.Those who have achieved levels outside the expected range were mostly classified as very easy (only one item needs revision, because it is very difficult).Similar to Cronbach's alpha, with a broader and more heterogeneous sample, these items may exhibit difficulty levels as expected.
From the point of view of discrimination analysis by point-biserial correlation, only 26% of the items reached the cutoff point of 0.35 indicated by the test authors.However, most of them showed significant point-biserial correlation coefficients (30 items or 70% of the total), although they did not fulfill stipulated criteria.Thus, one can say that most of the items are within expectations in terms of discrimination.This is supported by D indexes in the test: only 11 items with point-biserial correlation at p<0.05 achieved D<0.28.
With the results obtained, it was expected that 14 would require more significant revisions, because they were not considered culturally equivalent.However, the analysis of differential functioning of the items showed that, among the eight items with DIF, four were classified as psychometrically appropriate.These results show that a more detailed analysis of the semantic content of items should be performed to find out the possible causes of this effect.So, 18 items will need more systematic reviews before being implemented in the wider sample for standardization.The other items (classified as A or B) can be considered psychometrically adjusted within expectations.Therefore, 58% items in this study had psychometric adequacy to items from the American version.
Note that the steps recommended for the translation and adaptation of foreign tests, commonly used in national studies, were followed in this study.However, the results showed that the procedures followed for the adaptation of TOPL-2 were not enough for the adequacy of its application to the Brazilian population.Language and cultural issues must be reviewed, as well as the increase in sample for further studies.
The next research steps are intended for the review of semantic aspects of the items that were psychometrically inadequate for the parameters of difficulty, discrimination, and differential functioning.Important to consider that test manual reports only general statistics and per age group.So, the lack of data on the psychometric quality of items, which were analyzed individually, may hinder the adjustments needed.
In addition, after the psychometric adequacy of the items, studies should be conducted to verify the equivalence of the information assessed by them, through the investigation of special populations, such as individuals with different oral or written communication disorders, involving changes in the pragmatic function of language.Expectations are that soon it will be possible to carry out the study for standardization of the Brazilian version of this instrument, with proven cultural, linguistic, and psychometric equivalence.

CONCLUSION
Despite the limitations of this study as to sample size and variability, these results are promising in terms of psychometric adequacy of the Brazilian version of TOPL-2 almost half of the items in this study showed equivalence with those of the American version.

Figure 1 .
Figure 1.Distribution of TOPL-2 scores for the Brazilian sample

Table 1 .
Descriptive statistics by age group, gender, and total, and Cronbach's alpha (α) and correlation values of the sample in this study compared to the original study *p<0.05Caption: NS = nonsignificant; SD = standard deviation

Table 2 .
Index a values for the values excluded from the sample; *p<0.05;**p<0.01Caption: A = items psychometrically adequate to the original version; B = items with values too close to desirable; C = items needing review

Table 3 .
Descriptive statistics of items by gender, difference between means of girls and boys, difference between R 2 values of models (Nagelkerke R 2 ), and items' DIF classification