Assessing the physical violence component of the Revised Confl ict Tactics Scales when used in heterosexual couples : an item response theory analysis

Although there are psychometric evaluations of the Revised Conflict Tactics Scales (CTS2) when applied to heterosexual relationships, none has used item response theory (IRT). To address this gap, the present paper assesses the instrument’s physical violence subscale. The CTS2 was applied to 764 women who also responded for their partners. Single dimensionality assumption was corroborated. A 2-parameter logistic IRT model was used for estimating location and discriminating power of each item. Differential item functioning and item information pattern along the violence continuum were assessed. Gender differences were detected in 3 out of 12 items. Item coverage of the latent trait spectrum indicated little information at the lower ends, while plenty in the middle and upper ranges. Still, depending on gender, some item overlaps and regions with gaps could be detected. Despite some unresolved problems, the analysis shows that the items form a theoretically coherent information set across the continuum. Provided the user is aware of possible drawbacks, using the physical violence subscale of the CTS2 in heterosexual couples is still a sensible option. Domestic Violence; Questionnaires; Reliability and Validity Introduction The increase in family violence has generated a growing body of research on the development of measurement tools, including the class of instruments known as Conflict Tactics Scales 1,2,3,4,5,6,7. These are based on ‘conflict theory’ and aim to identify strategies used by individuals to solve disagreements and impasses 8,9. The first instrument in the series (CTS1) was proposed in the late 1970s and sought to address any kind of violent relationship 10. Despite its encouraging evaluations over the years and successful use in at least 20 countries 3,11,12,13,14,15, a more refined instrument was subsequently developed for dealing exclusively with intimate partner violence. The Revised Conflict Tactics Scales (CTS2) was presented in 1996 and consists of 39 items forming five scales, namely negotiation (6 items), psychological aggression (8 items), physical violence (12 items), sexual coercion (7 items), and injury (6 items) 13. The CTS2 has been studied since its release. Results show acceptable reliability, validity, and a factor structure in tune with the underlying theory 13,16,17,18,19. However, when focusing on intimate partner violence in heterosexual couples, such psychometric evaluations have exclusively used classical test theory 20. In order to address this gap, the aim of the present paper is to provide a more in-depth item analysis than is possible with traditional psychometric procedures. ARTIGO ARTICLE Reichenheim ME et al. 54 Cad. Saúde Pública, Rio de Janeiro, 23(1):53-62, jan, 2007 Because item response theory techniques offer more insight than those of classical test theory, more refined issues may be addressed such as examining the properties of individual items in order improve the scale’s efficiency and reliability in mapping the full continuum of intimate partner violence 20,21. Although there are two other studies about the Conflict Tactics Scales using item response theory, one evaluates an adapted edition of the CTS2 to measure physical violence in the context of male same-sex relationships 22, while the other relates exclusively to the CTS1 23. Moreover, little is known about the relative contribution and efficiency of the CTS2 constituent items in capturing the range of violence severity according to specific subgroups of interest. This issue also deserves attention since it is desirable for a scale to be group-independent in order to have breadth and general expediency. From an empirical perspective, insofar as possible the role played by each component item should be invariant with respect to age, gender, and so on. As a first development of this more detailed scrutiny of the CTS2 applied to heterosexual couples, the present article focuses solely on the physical violence subscale. Three main issues are examined: (1) gender differences, assessing whether the items display similar properties regardless of the perpetrator (the woman versus her male partner); (2) the ability of items to differentiate between individuals/couples possessing different levels of the violence trait, both in terms of their location along the construct continuum and their discriminating capacity; and (3) the extent to which items provide useful and reliable information about intimate partner violence along the violence continuum, which implies searching for possible item gaps or redundancies along the spectrum. An important assumption underlying the appropriateness of the item response theory models used here is also addressed, i.e., to replicate previous studies indicating that the scale on physical violence is effectively measuring a single dimension 24. However, unlike previous studies, full information factor analysis is employed to deal with the dichotomous nature of test items 25.


Introduction
The increase in family violence has generated a growing body of research on the development of measurement tools, including the class of instruments known as Conflict Tactics Scales 1,2,3,4,5,6,7 .These are based on 'conflict theory' and aim to identify strategies used by individuals to solve disagreements and impasses 8,9 .The first instrument in the series (CTS1) was proposed in the late 1970s and sought to address any kind of violent relationship 10 .Despite its encouraging evaluations over the years and successful use in at least 20 countries 3,11,12,13,14,15 , a more refined instrument was subsequently developed for dealing exclusively with intimate partner violence.The Revised Conflict Tactics Scales (CTS2) was presented in 1996 and consists of 39 items forming five scales, namely negotiation (6 items), psychological aggression (8 items), physical violence (12 items), sexual coercion (7 items), and injury (6 items) 13 .
The CTS2 has been studied since its release.Results show acceptable reliability, validity, and a factor structure in tune with the underlying theory 13,16,17,18,19 .However, when focusing on intimate partner violence in heterosexual couples, such psychometric evaluations have exclusively used classical test theory 20 .In order to address this gap, the aim of the present paper is to provide a more in-depth item analysis than is possible with traditional psychometric procedures.
Because item response theory techniques offer more insight than those of classical test theory, more refined issues may be addressed such as examining the properties of individual items in order improve the scale's efficiency and reliability in mapping the full continuum of intimate partner violence 20,21 .Although there are two other studies about the Conflict Tactics Scales using item response theory, one evaluates an adapted edition of the CTS2 to measure physical violence in the context of male same-sex relationships 22 , while the other relates exclusively to the CTS1 23 .Moreover, little is known about the relative contribution and efficiency of the CTS2 constituent items in capturing the range of violence severity according to specific subgroups of interest.This issue also deserves attention since it is desirable for a scale to be group-independent in order to have breadth and general expediency.From an empirical perspective, insofar as possible the role played by each component item should be invariant with respect to age, gender, and so on.
As a first development of this more detailed scrutiny of the CTS2 applied to heterosexual couples, the present article focuses solely on the physical violence subscale.Three main issues are examined: (1) gender differences, assessing whether the items display similar properties regardless of the perpetrator (the woman versus her male partner); (2) the ability of items to differentiate between individuals/couples possessing different levels of the violence trait, both in terms of their location along the construct continuum and their discriminating capacity; and (3) the extent to which items provide useful and reliable information about intimate partner violence along the violence continuum, which implies searching for possible item gaps or redundancies along the spectrum.An important assumption underlying the appropriateness of the item response theory models used here is also addressed, i.e., to replicate previous studies indicating that the scale on physical violence is effectively measuring a single dimension 24 .However, unlike previous studies, full information factor analysis is employed to deal with the dichotomous nature of test items 25 .

Methods
The present study is subsidiary to a hospitalbased case-control study exploring the relationship between violence within families of pregnant women and premature childbirth.Further details of the study population and field procedures may be found in Reichenheim & Moraes 26 .The effective sample used in the present analysis includes 764 couples.The physical violence subscale is part of a formally adapted Portuguese version of the CTS2 27,28 .Items refer to the respondent (woman) and by proxy in relation to her male partner or ex-partner.The recall period covers 6 to 9 months, depending on gestational age.Item contents are presented in the first table of the Results section.Complete wordings can be found in Straus et al. 13 .Readers are referred to Moraes et al. 27 for the complete Portuguese version.
The study was formally approved by the Research Ethics Committee of the Rio de Janeiro Municipal Health Department in conformance to the principles embodied in the Declaration of Helsinki.Participation in the study followed free informed consent.Confidentiality of information was guaranteed.All the women received orientation on where to seek help if they felt it was necessary.

Statistical analysis
As a first step, one-factor and two-factor full information factor analyses 25 were conducted to determine the extent to which a single latent dependence factor accounted for a common variance among items.The solution used tetrachoric matrices and was implemented in TESTFACT 29 .
Differential item functioning was evaluated to address how the items behaved between genders 24 .The method used here follows Thissen et al. 30 .The procedure uses a 1-parameter logistic (1PL) item response theory model and aims to detect interactions between gender and the item location parameters (b i ).These indicate at what level of the latent variable an item is able to differentiate between a positive and negative response, and equals the latent score (θ s ) at which half the respondents answer positively to the item.The 1PL model predicting the probability of endorsement for person on item is defined as Recognizing that the present analysis takes women's responses as "reference" and regards partners as the "focus" group 31 , the procedure involves calculating , where b i(w) and b i(p) are the means of all location estimates for women and partners, respectively; b * i(p) are the individual adjusted estimates regarding the second group; and ∆ i are the differential item functioning estimates obtained by subtracting the effectively estimated parameters for women from the adjusted b * i(p) values.Standard normal test statistics were used to assess the significance of ∆ i .
Once the items involving differential item functioning were uncovered, a 2-parameter lo-gistic (2PL) model was considered for the main item response theory analysis.The 2PL model is given by where b i and θ s are defined in equation ( 1) and a i represents item discrimination.Larger values of a i correspond to steeper item characteristic curves (ICC), indicating that an item is discriminating increasingly well between levels of the construct.Items with differential item functioning include separate b i and a i estimates for each gender group, while those not involving differential item functioning are common to both.The choice of a 2PL model was based on a formal comparison with the 1PL model using a likelihood ratio test of statistical significance 30 .Generally, the difference between the -2log(likelihood)s associated with each model is given by , which is distributed as a chi-square with df = df (1PL) -df (2PL) .Since in the present data G 2 diff = 5386.070-5333.460= 52.61 and df = 14, p = 0.0001.
The root mean square-fit statistic was used to explore item fit 24,32 .This statistic is given by , where z si is the standardized residual defined as ( ) and, in turn, x si is the observed status of subject s on item i (either 0 or 1).The variance of The third main goal, namely to examine the amount of item information along the violence continuum, was addressed through the test information 24 .In a 2PL model, this is given by 1 ( ) ( ) , which is a function of the information curve for item i at trait level θ s , i.e., ( ) . Note that, since TI (θ s ) = 1/(SE(θ s )) 2 24 , this is also an indicator of the instrument's precision at θ s .
The item response theory analysis was carried out in BILOG-MG 31 .Item response curves shown in Figures 1 and 2 were obtained in Stata 8.2 33 .

Results
Table 1 provides the factor structure of the subscale.Irrespective of perpetrator, the full information factor analysis indicates that almost all loadings are above 0.6 in model A and in the first factor of the two-factor B model.Focusing on the woman as perpetrator, nearly two-thirds of the total variance is accounted for by the first factor, whereas the second "explains" less than 7%.An even more striking pattern is found for partners.
The differential item functioning results are in Table 2. Three items present differential functioning (i-1, i-8, and i-9) as judged from the statistical significance of the ∆ i estimates.Note that negative values emerge when the adjusted location estimates for the "focus" group (b * i(p) ) are lower than those for the "reference" group (b * i(w) ), a situation following a lower endorsement frequency in the latter.Thus, from a psychometric standpoint, a negative value indicates that a given act captures a higher level of the trait (violence) when perpetrated by women.Concentrating on the statistically significant differential item functioning items, this is the case for beating up (i-8) and grabbing the other (i-9).Conversely, throwing something that may hurt (i-1) is less common in men than in women, which entails a higher θ s when carried out by the former.
Acknowledging three items with differential item functioning, the main item response theory estimates are shown in Table 3.Standard errors are relatively small for both types of estimates, even for item 11 that had relatively few subjects endorsing it and the least satisfactory fit (last column).All but item 11 and item 1 for partners present discriminations (a i ) above 2.0.
The "steepness" of each item may be better examined in Figure 1, which shows the ICC obtained from the fitted estimates.In addition, the curves convey the item coverage of the latent trait spectrum, indicating that there is very little, if any, information below θ s = -1.This is expected since the mean of θ s is approximately zero and there are not many positive responses to the items.
A similar picture is expressed by the location estimates presented in Table 3 (second column) extending from b 3 = 1.060 to b 11 = 3.503, and more markedly, by the gender specific information functions (curves) shown in Figure 2. Notice that information peaks at θ s = 1.6 for women and θ s = 1.7 for partners.
Figure 2 also displays the positioning of items along the violence continuum, where item overlaps and regions with gaps may be best inspected.Irrespective of gender, there is some sparseness at the upper spectrum between i-6 and i-11, which adds to the previously alluded absence of information taking place at the lower ends.Conversely, items 5, 7, and 12 tend to cluster in a very narrow region, precisely where information is most abundant.
Besides those "common" aspects, some features also emerge as a consequence of differential item functioning.Focusing on the three items, b 1 and b 8 are much farther apart than b 9 , the first

Discussion
The present study supports a single-factor solution for the physical abuse scale of the CTS2.Even if this finding agrees with several previous reports 16,17,18,22,23,28,34,35,36 , it has the strength of being based on a full information factor analysis.Note that this result is central to what follows, since a single dimensionality of a scale being scrutinized is essential to render item response theory interpretable 24 .All item slopes were reasonably large, indicating that the items discriminate well between levels of the construct.Most ICC tend to cover about one unit interval of the q s (violence continuum).This is indicative that the majority of subjects endorsing a given item are quite similar in their underlying latent-trait.The same trend has been found in other studies on the CTS using item response theory 22,23 .This is also true for the mapping of items along the violence continuum 22,23 , which, without further qualification, seems to be quite reasonable.Still, a few points require some thought, since this mapping does not happen along the entire range, given that information on violence is confined exclusively to the upper half of the spectrum.As a result, moderately violent subjects may go unnoticed in practice.Failure to detect violence-positive individuals can have public health consequences if incipient cases that could escalate in the future are missed.In addition, from an epidemiological research perspective, misclassifying couples as "negatives" could lead to the attenuation of risk measures if the variable is the target exposure 37,38 or to residual confounding if included in a model as a confounder 39,40 .Further research could identify items to fill this gap.
Several gaps and item overlaps also occur along the continuum where information is available.Besides, the item distribution pattern is somewhat distorted by differential item functioning, especially for women.While not markedly influencing the information curves (Figure 2), some redundancies are still found.One may thus argue whether certain items might need refinement or replacement or should even be withdrawn altogether.Of course, precisely which items is a matter for debate and future investigations.
Between partners, the effect of differential item functioning is more on the ordering of items than on distribution along the continuum.While there is a fairly even spread, an apparent inconsistency vis-à-vis the theoretically expected item build-up may be spotted, which, incidentally, is preserved among women.Anticipated at the lower end, item 1 (threw something that could hurt) now occupies the middle range of the spectrum.The same "misplacement" was found by Regan et al. 22 studying same-sex male partners.One may conjecture that, although representing a more "trivial" act for women, the item has another meaning for men, tending to be performed mostly by those "bearing" higher violence levels in this group.Another idea to explore is that the act is culturally "feminine" in essence and is simply less used by men (whether violent or not).At any rate, according to the present study the component items are definitely not gender-invariant.While acknowledging that the acts allegedly perpetrated by partners were obtained by proxy responses and that spousal consensus on the CTS scales may not always be optimal 41,42,43,44 , the very presence of differential item functioning can raise concern.Depending on gender, the same enactment may indicate different levels of violence, whether factual or related to a given item idiosyncrasy.Regardless, the ramification is that an item may not be grasping the same intensity of violence.This is particularly important if the use of raw scores is foreseen in practice because a given count may not actually correspond to identical levels of the construct being tapped.From a psychometric perspective, such a discrepancy detracts from the scale's "universality", which recognizably is a quite desirable (if not essential) property of any measurement tool.
Several issues are worth considering when assessing the present findings.First, one can contend that the results emerge in a different cultural milieu than that of the instrument's original development (USA), thus potentially limiting the generalizations.However, as indicated in the methods section, the CTS2 version employed here underwent a careful cross-cultural adaptation 27,28 based on strict procedural guidelines 45,46 .Even so, given the instrument's widespread use, it would be interesting to have more studies using item response theory models replicated on the scales of the original CTS2 in English, as well as on other versions besides Portuguese. .9

Women
Partners pushed and shoveed (i-3) threw something (i-1) beat up (i-8) grabbed (i-9) threw something (i-1) beat up (i-8) burned or scalded (i-11) choked (i-6) used a knife or gun (i-4) twisted arm or hair (i-2) kicked (i-12) slapped (i-10) punched/hit (i-5) slammed against the wall (i-7) Another important point when generalizing the findings is that the present study was restricted to pregnancy.This not only implies a shorter recall period than the usual 12 months, but may also be portraying an atypical situation.Although this still has to be confirmed, one can envisage certain events (items) happening more often outside of pregnancy, and as a consequence, item response theory results showing better coverage of the instrument along the violence continuum.It should be born in mind that b i is also a function of the endorsement proportion, the more frequent the item, the lower the respective estimate 24 .Again, the research program on the CTS2 would gain from further studies using item response theory models applied to data from a general heterosexual population, which thus far has surprisingly never been done.
In addition to the suggestions made so far, an interesting development would be to conduct a more refined evaluation of the CTS2 using item responses coded into more than two categories.Noting that each level bears its own discriminating power and position in ordered-response item response theory models 47 , identifying those with good psychometric properties could help reduce the total number of component items and ultimately improve the instrument's operational capability.Readers should be reminded that if used in its current full edition (78 items), application of the CTS2 in one sitting is tiresome for all concerned, even if responses are limited to dichotomous answers.
As emphasized previously, this study is the first in a series aiming to explore the component scales of the CTS2 applied to heterosexual couples using item response theory methods.Thus, further studies are needed to examine the other subscales, particularly psychological aggression.A more in-depth scrutiny of this subscale might shed light on the violence spectrum missed by the physical abuse subscale and effectively serve as a useful complement rather than merely "one more" tool in the early detection of intimate partner violence.
Despite the limitations noted above and some still untapped questions, the item response theory analysis reported here shows that the CTS2 items on physical violence define a one-dimensional trait and form a theoretically coherent information set across the continuum, at least where there is actual coverage.Provided the user is aware of some possible gender discrepancies and that not all empirically negative cases are necessarily void of violence, use of the physical violence CTS2 subscale in heterosexual relationships is still a sensible option.
being the model-derived conditional probability of a subject s endorsing an item i calculated from the parameters estimated in Equation 2. Values of 2 RMS( ) i z outside the range of -2 to +2 may indicate that the overall item fit is questionable.

Figure 1 Common
Figure 1 Common item characteristic curves (ICC) for items without differential item functioning (DIF) (a); gender related ICC for items with DIF (b-d).

Table 1
One-factor (Model A) and two-factor (Model B) full information factor analysis.Results are given by gender (type of perpetrator) and sorted by appearance in the original form.
13Ordering follows the appearance in the English edition of the CTS2 in which items on physical violence are randomly placed among those of the other subscales and items pertaining to respondents and partners as perpetrators receive consecutive numbers13.The original numbering system is provided in brackets.Table 2Differential item functioning estimates (∆ i ).Items sorted by appearance in the original form.

Table 3 2
-parameter logistic (2PL) model estimates, including separate b i and a i for three items presenting gender differentials (DIF).Sorted by increasing intensity of violence (b i ).
* Items with DIF identifi ed according to perpetrator.In brackets: W -women; P -partners (men); ** In brackets: standard errors; *** Item root mean square fi t statistic.