Psychopathy Checklist: Youth Version psychometric properties in an Item Response Theory polytomous model Propriedades psicométricas da Psychopathy Checklist: Youth Version em um modelo politômico da Teoria da Resposta ao Item

This study assessed the applicability of the Psychopathy Checklist: Youth Version in a sample of teenagers confined in socio-educational institutions. Using an Item Response Theory approach, item properties of this instrument were reviewed using the generalized partial credit model. Eight of the original twenty items of the original instrument were discarded due to low discrimination parameters. As expected, the most discriminating items in the assessment of psychiatric traits


Palavras-chave: Avaliação psicológica; Psicometria; Teoria de Resposta ao Item; Transtorno da personalidade antissocial.
Personality disorders are a persistent pattern of behavior that deviates significantly from an individual's culture expectations, is diffuse and rigid (American Psychiatric Association, 2013). Psychopathy is considered a serious personality disorder characterized by behavioral and affective problems that can generate a greater tendency towards criminality, although such acts do not, by themselves, define the disorder (Zepinic, 2017). In general, such behaviors tend to be directly associated with a difficulty in generating adequate empathic reactions in an interpersonal context (Robinson & Rogers, 2015). Current findings also suggest that violence committed by psychopaths tends to be more instrumental and therefore geared to personal advantages rather than merely reactive (Dhingra & Boduszek, 2013).
Although this disorder can only be diagnosed in adulthood, adolescents with psychopathic traits show a heightened sense of grandiosity, self-centeredness, manipulative ability, emotional insensitivity, lack of empathy, lack of remorse and, irresponsibility among other antisocial tendencies (Vahl, Colins, & Lodewijks, 2014). High levels of antisocial traits in adolescents are also associated with violent acts and episodes in which the individual is often in conflict with the law (Loeber, Burke, & Pardini, 2009;Salekin & Frick, 2005).
One of the tools developed for the diagnosis of psychopathy is the Psychopathy Checklist-Revised (PCL-R), developed by Robert Hare and considered the "gold standard" in the evaluation of psychopathy (Vitacco, Neumann, & Jackson, 2005). This instrument aims to comprehensively evaluate the different symptoms that characterize psychopathy, measuring affective, interpersonal and behavioral characteristics. In general terms, this checklist brings together twenty separate items, scored by means of a three-point ordinal scale considering, to this end, the degree to which the subjects' behavior and personality match the description presented in the manual (Hare, 2003). In Brazil, the PCL-R was adapted from Morana's study (2004).
However, controversies about the most appropriate factorial structure for the PCL-R have been recurrent after its use in different cultures and in different countries (Vitacco, Neumann, Caldwell, Leistico, & Van Rybroek, 2006). A systematic review performed by Hauck Filho, Teixeira, and Almeida (2014), evidenced, for example, that the two-factor model is still recurrent in recent scientific literature, although it is not the only one mentioned in different studies that used the confirmatory factor analysis for this same check list.
The observation of the items in the Psychopathy Checklist: Youth Version (PCL: YV) reveals that many items came directly from the PCL-R, with little or no change. International studies evaluate the validity and reliability of the instrument as appropriate (Forth, Kosson, & Hare, 2003), but its predictive ability in relation to criminal recidivism has been scrutinized (Olver, Stockdale, & Wormith, 2009;Shepherd & Strand, 2016), and comorbidities may also decrease the predictive value of instruments, including the PCL: YV (Khanna, Shaw, Dolan, & Lennox, 2014). In men, the second factor ("socially deviant lifestyle") is predominantly responsible for the predictive value. In women an equivalence is inferred (Cope, Ermer, Nyalakanti, Calhoun, & Kiehl, 2014), despite the lack of truly conclusive data in this regard (Walters, 2014).
In the McCuish, Mathesius, Lussier, and Corrado (2018) study with Canadian natives and young Caucasians, PCL: YV outcomes were similar between the two groups, although the first group was overrepresented in the penal system. However, in the work of Sitney, Caldwell, and Caldwell (2016), African Americans revealed significantly higher scores and recidivism rates when compared to young Caucasians, even when controlling for other influences. In particular, in cultures other than the North American, there is still a significant lack of such studies (Pechorro, Barroso, Maroco, Vieira, & Gonçalves, 2015). Regarding the association between PCL: YV scores with other variables, the data obtained with this instrument are similar to those suggested from studies with the adult version, including the relationship between symptoms of psychopathy with drug abuse (Vincent, Cope, King, Nyalakanti, & Kiehl, 2018).
The development of more reliable measures to assess psychopathic traits in young people has shown that these same traits tend to be more present in individuals with chronic criminal trajectories (McCuish, Corrado, Lussier, & Hart, 2014;Tsang, 2018), as well as other risk factors (Chauhan et al., 2014). In addition, cognitive mediating aspects between scoring and future violent behavior also need to be better understood Kahn, Ermer, Salovey, & Kiehl, 2016;Walters & DeLisi, 2015).

Method Participants
The investigation sample consisted of 255 male adolescents in conflict with the law, evaluated while complying with socio-educational measure of deprivation of liberty or provisional detention. Data were collected during the period from 2008 to 2010.
The inclusion criteria used were: age between 12 and 19 years old; being in socio-educational condition of deprivation of liberty or in provisional detention at Fundação de Atendimento Socioeducativo (FASE, Socio-Educational Care Foundation); voluntarily agreeing to participate in the investigation. On the other hand, exclusion criteria were: active psychotic signs or symptoms, intellectual disability or hearing impairment clinically detected or described in the medical records; refusal to complete the application of the instruments at any stage of the process.

Instruments
Hare's Psychopathy Inventory: Youth Version (Gauer, Vasconcellos, & Werlang, 2006) includes an interview guide, plus the technical manual and a response form to be completed by the evaluator, consisting of 20 items that investigate affection, interpersonal aspects and lifestyle. Brazilian studies, although incipient, indicate adequate inter-rater reliability for PCL: YV (Ronchetti, Davoglio, Salvador-Silva, Vasconcellos, & Gauer, 2010). The instrument is scored considering the adolescent's usual functioning since late childhood/early adolescence except for one item that also considers the functioning prior to the first decade of life. Each item R.C. FERRAZ et al.
is scored on a three-point grading: zero (0) characteristics are not present, or oppose, or are inconsistent with the intent of the item; one (1) when the item partially applies but not to the degree required for a higher score; or when there is uncertainty about the application of the item, or there are conflicts between sources of information that cannot be resolved in favor of 0 or 2; two (2) when the item applies perfectly in its most fundamental aspects. Exceptionally, when there is not enough information to score an item, the item's score may be omitted, but omissions may never exceed five out of twenty.

Data Analysis
The objectives of the analyses performed are: to obtain the PCL: YV item parameters in a generalized partial credit model, and to build a scale using anchor levels established from the selected items.

Item Response Theory
Although it includes behavioral aspects, the construct called psychopathy cannot, of course, be directly observed. In this connection, statistical models of latent traits (variables that cannot be directly observed but inferred from observable variables) are one of the main objects of investigation for quantitative methods in social sciences. In particular, the Item Response Theory (IRT) proposes to connect a person's probability of correctly answering an item with a combination of that person's ability and the attributes of that item (Andrade, Tavares, & Valle, 2000). For Pasquali (2003), IRT adopts two fundamental axioms: 1) Performance on a test item is the effect and latent traits are the cause; 2) The performance-latent trait relationship can be described by an increasing monotonic equation called the Item Characteristic Curve (ICC).

Interpretation
The interpretation of the parameters is made as follows: for Ayala (2009), a "reasonably good" discrimination parameter (α) varies approximately between 0.8 and 2.5. Baker and Kim (2017) claim as moderate α between 0.65 and 1.34. The α discrimination of each item reaches its maximum value in the difficulty level of this item's ability. The difficulty parameter β corresponds to the point on the ability scale where P (θ) = 0.5; when an item is easy; this occurs at a low skill level; when it is difficult, it occurs at a high level. These skill levels are usually set out as a function of the standard deviation from the mean on the scale (0, 1), and then each level is given a qualitative interpretation (so that data presentation becomes more pedagogical) and are then called anchor levels.
The number of anchor levels is not known a priori. To determine at which level an item is anchor (or even if an item is anchor at some level), Beaton and Allen (1992) indicate the following conditions: 1) positive answer by a large percentage of individuals (at least 65%) with this skill level; 2) be answered positively by a smaller percentage of individuals (maximum 50%) with the immediately previous skill level; 3) the difference between these two proportions must be at least 0.30.

Model selection
PCL: YV is an instrument with polytomic items, requiring consistent modeling. Choosing not to consider the hypothesis of items with the same discrimination (the case of Rasch models), one has to choose between two polytomic models: the Generalized Partial Credit Model (GPCM) (Masters, 1982;Muraki, 1992) or the Graded Response Model (GRM) (Samejima, 1968). According to Tsang, Piquero, and Cauffman (2014), although appropriate for categorical responses, GRM assumes that the likelihood of responding to a higher category increases as the latent trait increases.
In fact, this is expected. However, this hypothesized ordering may not be reflected in the empirical data. In polytomic items, if the probabilities do not increase along with the latent trait, at least one response category will never be the most likely (for any point on the θ scale; Andrich, 1988). The assumption of ordered categories made by the GRM implies that this model is unable to detect these "disorganized thresholds" (Tsang et al., 2014). The GPCM, while making the same assumption, does not force the thresholds to be ordered.
In the interpretation of the discrimination parameter the criterion of Baker and Kim (2017), accepting as moderate or higher an α ≥ 0.65 was adopted. The α discrimination of each item reaches its maximum value at the difficulty level of the item. The difficulty parameter β of the item corresponds to the point on the ability scale where P (θ) = 0.5: when an item is easy, this occurs at a low skill level; when it is difficult, it occurs at a high level. In polytomic models, as in the case of this paper, each score of an item (as in the case of PCL: YV; 0, 1, and 2) is called a category and exhibits its own characteristic curve. The intersection parameter δ j of one category with the previous one indicates the latent trace levels where it is equally likely to answer either score (0 or 1, 1 or 2). This parameter refers to the exact points of intersection, not to the category as a whole. An item with m j categories exhibited m j-1 intersection points. As Muraki (1992) arbitrarily defined δ 1 = 0, we work with δ 2 as an intersection between categories 0 and 1, and δ 3 as an intersection between 1 and 2, in the case of PCL: YV. In the interpretation of anchor levels, since there were few items left, the criteria 2 and 3 previously described were relaxed: percentages of up to 55% in the previous skill level were accepted, as well as differences greater than 0.25, a procedure also adopted by Mafra (2011).

Parameters
The study with polytomic data considered the twelve items that presented discrimination parameter (α) higher than 0.65. The categories δ of the items, reviewed by the generalized partial credit model, referred to a three-point Likert Scale, covering the scores "0", "1" and "2" equivalent to "The item does not apply to the individual", "The item applies to some degree, but not to the point of representing a score 2, or "there are many conflicting exceptions or information" and "The item reasonably applies to the individual in the most essential aspects", respectively. Table 1 presents (a) the discrimination and location parameters of the categories of all items, considering the application in GPCM by the R Software (R Core Team, 2013), through the ltm package (Rizopoulos, 2006); and (b) the recalculated parameters of the remaining items after the elimination of the items with low discrimination (1, 2, 4, 13, 14, 16, 17 and 19). Figure 1 presents the characteristic curves of the categories of the 12 selected items. Items 8, 6, and 18 presented the greatest discrimination parameters (αs = 3.00, 2.00, and 1.75, respectively), resulting in steeper curves and suggesting that they are more sensitive to different levels of psychopathy traits than items 9, 3 and 11, which presented the lowest αs among the selected items, for example (0.71, 0.74 and 0.80, respectively).
The data in Table 1 and Figure 1 also show the difficulty parameters of the categories and the latent trait range that each test response is most likely to meet. For example, in item 9, "Parasitic Lifestyle", a score 2 answer ("The item applies reasonably well to the individual in the most essential aspects") only becomes the most likely answer from point 1.89 on the latent trait scale (and yet it is only close to 0.5); on the other hand, an individual with the same θ approaches the probability of 1 for a response of item 6, "Lack of Remorse/Guilt".   Figure 2 presents the Item Information Functions (IIFs); the IIF peaks indicate where an individual with a latent trait truly at that level can be accurately estimated (Baker & Kim, 2017). The smaller the amount of information, the lower the accuracy of θ estimates. Item 3, for example, is more accurate (with information close to 15) in the latent trait range between -1 and 0, and loses accuracy as it approaches extremes. Figure 2 also shows at which points on the latent trace scale the test provides more accurate estimates on individuals. It is clear that the test is positioned to the left in relation to the mean and that most of the information is located approximately between -1 to 1 on the latent trace scale. The test provides little information about respondents located outside this range, which are in significant numbers and whose interviews represent a considerable spending of time in prison surveys. In summary, these results suggest two conclusions: (a) the short version of the instrument offers similar levels of information as the full test and (b) the instrument is more informative at medium latent trace levels and is not indicated for use in the clinical setting.

Anchors
As stated above, by definition δ 1 = 0: the probability of obtaining a score "0" or higher is always 1, regardless of the position on the scale. After calculating the cumulative probabilities for δ 2 R and δ 3 R (the probability of obtaining the score "1" or "2" at each level of the latent trace scale) and only of the upper category δ 3 R (score "2"), it was possible to establish five anchor levels: -1, 0, 1, 2 and 3. Table 2 shows a synthesis of the anchor items for the categories equivalent to the scores "1" and "2", according to the values found for δ 2 R and δ 3 R, and the relevant levels that determine anchor categories.
According to Table 2, individuals at level -1 are characterized for achieving a score of "1" or higher for the items "Need for Stimulation", "Parasitic lifestyle" and "Irresponsibility", which fall into the Behavioral    group, and also in the items "Poor behavioral controls" and "Severe criminal behavior", which fall into the Antisocial group. Likewise, in the following levels, the categories of each item that correspond to certain items are successively exposed. It is important to note that this does not mean that a respondent of any level will necessarily respond in this exposed pattern, since these are probabilities: anchor levels have the function of allowing a more accessible and pedagogical interpretation of the data.
Table 2 also shows that the vast majority of item categories are characterized as providing more information about individuals between levels -1 and 1. Only the δ 3 R categories of the items "Promiscuous sexual behavior" and "Parasitic lifestyle" define levels 2 and 3 respectively. It is an issue that a complex-use instrument such as PCL: YV, which requires a lot of time to apply and to train the surveyors, has such a limited range of information. In fact, this may represent an under-representation of precisely the portion of the population that requires more attention from mental health professionals in socio-educational institutions.

Discussion
The properties and functioning of the PCL: YV items in a sample of male adolescents in the state of Rio Grande do Sul were presented here. The results of the reviews performed through the IRT suggest that out of the 20 items of the original version of the instrument, 12 presented satisfactory discrimination parameters. It is noteworthy that the lack of adjustment of the other 8 items may be due to the few answers in those categories, considering the limited sample. Some items are relatively better at assessing different levels of psychopathy (for example, items 6 and 8: "lack of remorse/guilt" and "callous/lack of empathy"); in fact, these affective characteristics can be considered traditional in the description of psychopathy traits, and their greater variability among young people is reflected in the responses to these items. Other items, such as "Need for Stimulation", were less sensitive: once again, this may be due to a greater homogeneity of this trait among adolescents.
Some items were more extreme, i.e., a higher level of latent trait is required to be scored with a "The item applies to the individual reasonably well in the most essential aspects" response by the surveyor. Two items that exhibited this functioning includes "Parasitic Lifestyle" and "Promiscuous Sexual Behavior," which is consistent with the manual information and PCL: YV training. Considering that most adolescents are financially dependent on their family, as well as the social vulnerability to which the target audience of the socio-educational institutions is exposed to, the task of the evaluator is made more difficult and a more extreme trait becomes necessary to be given a high score. Similarly, the item "Promiscuous sexual behavior"  Classification of anchor categories of items and assessment factor by latent trait level 2 of 2 also presents particular challenges in the assessment: a teenager who has never engaged in sexual intercourse receives a score of zero, and even if he or she has been involved, it is likely that only the individuals with the highest latent trait levels will be willing to talk about these experiences and describe them as impersonal or trivial.
It can be seen from the results that the instrument offers more information in the near-average latent trait range. The same causes that affect the parameters of discrimination and difficulty discussed earlier also affect the performance of the items that were maintained. However, response distortions that may affect the behavior of items should also be considered. Response distortions are systematic tendencies to respond to a range of items that are not relevant to the construct being measured (Bandalos, 2018), resulting in biases that affect instrument accuracy and validity. Social desirability is the most well-known and studied distortion and comes in different forms: two of its components that we hypothesize to have a significant influence in this study are self-deception enhancement, in which the respondent perceives himself more favorably than he ought to, and impression management, when a conscious effort to conceal "flaws" exists. Other relevant distortion in teenager evaluation is acquiescence, the tendency to disagree or agree with most items regardless of their content. Response distortion is not a trivial problem in psychological assessment and is particularly challenging in antisocial behaviors investigations: in future studies, we suggest the use of instruments that can detect patterns of social desirability, including other scales and implicit association tests.
In the psychometric context, the construction and comparison of different statistical models is an essential step in the development of tests and in obtaining evidence of construct validity -or, in a more modern terminology, internal structure validity (Messick, 1995;American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014;Bandalos, 2018). In the case of IRT applications, comparison between studies is possible provided that the formulation of the models involved is respected. The following explanation is brief and therefore superficial, but technical details are available in the psychometric literature (e.g., Maydeu-Olivares, 2005). Using the Thissen and Steinberg (1986) taxonomy, there are two major classes of parametric models in IRT: difference models and division by total models. The main difference model is Samejima's GRM (1968). Among the division by total models, the most used in applications are those that do not include a guessing parameter; three of them are considered nested (i.e., the simpler models can be obtained by limiting the more general model): Bock's nominal model (1972), Thissen and Steinberg's ordinal model (1986) and Masters' partial credit model (1982). The generalized partial credit model is equivalent to the formulation of Thissen and Steinberg. In the case of the present study, the estimated parameters can be compared with the outcome of other studies that used nested models; this same comparison cannot be done with GRM, for example (in fact, GRM and GPCM have the same number of parameters). However, it is noteworthy that GRM and GPCM are not equivalent, allowing the comparison of model fit statistics.
Leaving aside cultural differences, comparison with English-language work that also used the GPCM can also be helpful. Some of the items in this study with higher discrimination parameters also presented good parameters in the study by Tsang et al. (2014), such as "Callous/Lack of Empathy" and "Lack of Remorse/ Guilt", suggesting that these attributes are good indicators for differentiating between different levels of psychopathy traits. Similarly, items such as "Need for Stimulation" also proved to be poorly discriminating in other studies (Tsang et al., 2015(Tsang et al., , 2014, suggesting that this is a more universal feature of adolescence and therefore less informative for the purpose of the PCL: YV. Other items also presented more inconsistent discriminations in these multiple studies, such as, for example, the item "Shallow emotions". Besides cultural differences that could influence the evaluators' criteria on what constitutes the demonstration and the experience of emotions, it would be interesting to explore other explanations in studies in Brazil with samples of individuals from different institutions and using Differential Item Functioning.

Final Considerations
Despite the limitations, it is noteworthy that as far as the authors are aware this is the first study in Brazil to evaluate the properties of PCL: YV items. It is important to highlight, however, that these findings are related to this selected sample. Further studies with similar methods with samples from different institutions are encouraged, as well as studies that include Differential Item Functioning analyses for aspects such as age, gender, geographic region, and ethnicity. Caution is also advised in the use of PCL: YV and its scores by practitioners and investigators, who should take into account these potential differences. This study does not support the indiscriminate use of PCL: YV in assessments of young offenders in the southern region or elsewhere in Brazil.