Validity of the TAT in Brazil: Theoretical and Methodological Issues techniques, test validity Validade do TAT no Brasil: Questões

Paulo, SP, Brasil ABSTRACT – Although the Thematic Apperception Test (TAT) is popular in Brazil, showing evidences of its validity remains a challenge. In the present article, we discuss such issue by analyzing the definition of the TAT as a projective method and a psychological test, its use by different theoretical traditions, relations between nomothetic and idiographic analysis levels, limitations of Classical Test Theory for evaluating the instrument’s properties, and challenges regarding research and practice with the instrument in Brazil. We advocate that overcoming a traditional view of projective techniques, using multidimensional methods and performing wider empirical studies on norms and validation evidences with multicenter databases may allow more secure and informed practices with the instrument among researchers and practitioners

Since its final version in 1943, the use of the Thematic Apperception Test (TAT; Murray, 1943Murray, /2005 in research and practice poses challenges related to the extent to which data from this instrument informs about personality characteristics, as well as reliability and validity of interpretations drawn from such data. On one hand, the great variety on the use of the TAT reflects the diversity of psychological paradigms and theories. In this regard, for example, the Journal of Personality Assessment recently dedicated two special sessions (Jenkins, 2017a;Stein & Siefert, 2018) to the TAT and related instruments. On the other hand, the need for evidence supporting such uses is required as a condition for its acknowledgement as scientific. In this context, arguments from different (and, at a first glance, opposed) epistemological bases present possibilities and limitations related to theory and research, as well as a discussion of the instrument's scientificity. The present essay focuses on reviewing such arguments, in order to discuss the challenges of the TAT in the Brazilian context. Such context has specific needs, for the country's Federal Council of Psychology (Conselho Federal de Psicologia [CFP]) regulates the use of psychological tests by practitioners, having recently updated its technical criteria for allowing the use of these instruments (CFP, 2018). Such criteria include that authors report (1) the constructs assessed by the instrument, (2) evidence for justifying its adoption, (3) recent studies with Brazilian samples on the instrument's psychometric properties and (4) its correction and interpretation system.
In the next topics, we discuss the challenges for the TAT to attend the criteria described by the CFP (2018), as well as the theoretical and empirical evidence that supports that this instrument has a strong potential for reassuring its scientific value and encourage its use among Brazilian researchers and practitioners. More specifically, we discuss the definition of the TAT as a psychological test, its theoretical model (and the evidence that supports it), and issues not well addressed by research (especially in Brazil), namely, on the integration of nomothetic and idiographic approaches for the instrument, and how to demonstrate its psychometric properties.

TAT: PROJECTIVE OR SELF-EXPRESSION TECHNIQUE?
The very denomination of the TAT has been a debate. Traditionally considered a projective technique, the term has been subject to criticism for its association to psychoanalysis (which leads to a false impression that the techniques are exclusive for such theory), as well as conceptual limitations of the processes assumed to occur during responding (Meyer & Kurtz, 2006;Bornstein, 2007). Rietzler (2006) suggests that instruments like Rorschach and TAT are referred to as self-expression techniques, in contrast to psychometric or self-report ones. Meyer et al. (2017) avoided the term "projective technique" by describing the Rorschach simply as a problem-solving task.
It is interesting to note that, while the term self-report is widely adopted, research on self-expression techniques still uses the term "projective technique or method". This situation is apparently due to such terms' popularity; also, the concept of projection, although initially described in the psychoanalytical theory, does not necessarily refer to a pathological process (Anzieu, 1981;Verdon et al., 2014), which legitimates such techniques for the study of personality in general.
Although the argumentation by Meyer and Kurtz (2006) and Bornstein (2007) helps to avoid a misleading association of such instruments only related to psychoanalysis, we will use the term "projective technique" in the present study, due to its popularity. We acknowledge, though, that the debate on the psychological processes underlying responding to projective techniques is related to theoretical models with different assumptions of personality functioning and structure, whose comparison extends the scope of the present paper. Despite, we briefly present the original framework of TAT's theory (as well as its main most recent derivations), which allows showing that non-psychodynamic researchers and practitioners have been adopting TAT cards, regardless of its association to such theoretical model (for an introduction to such diversity, see Jenkins, 2008). Murray (1943Murray ( /2005 proposes what more recent literature refers to as the projective hypothesis for the TAT. According to this account, storytelling content during the test allows the expression of "dominant drives, emotions, sentiments, complexes and conflicts of a personality" (Murray, 1943(Murray, /2005 3). More specifically, the same author states that the TAT task depends on the tendency of people to interpret situations according to prior experience and current motivations, so that they would express personal content on stories, with varying degrees of consciousness. Murray (1943Murray ( /2005, though, defines such process as apperception. In this sense, projection (as it is used in the psychoanalytical theory) and apperception would be related (although independent) phenomena, in terms of their degree of subjectivity. Projection refers to the attribution of internal psychological content (i.e., fantasies, expectancies, and motivations) to external stimuli. In other words, it implies a subjective interpretation of the external reality (without necessarily distorting it), based on internal variables that may not be conscious by the individual. Pathological levels of such process imply in a distortion of reality's meaning as internal content could prevail from other stimuli. In its turn, apperception also consists of a subjective interpretation; it is related to the effect of prior experience when interpreting a new one, especially for complex situations such as interpersonal ones. Thus, both conscious and automatic processes, such as cognition, perception and prior learning, mediate apperception, which has led different theoretical traditions (other than psychoanalysis) to adopt TAT cards for the study of personality (see, for example, Blankenship et al., 2006;Jenkins, 2008;Annotti & Teglasi, 2017).

THEORETICAL FOUNDATIONS OF THE TAT
One traditional approach to TAT that derives from such understanding is the research tradition on the measurement of motives, classically, achievement (nAch), affiliation (nAff) and power (nPow), based on the pioneer work of David McClelland and John Atkinson (Cramer, 2004). More recently, Tuerlinckx et al. (2002) described models for such measurement, which were evaluated using IRT modeling. Essentially, TAT cards would arouse the expression of a need due to its intrinsic characteristics and the person's base level of the need. Tuerlinckx et al. (2002), though, state that such process would not be linear, but a drop-out one, in which activation of need-related content on storytelling would not occur, even though a card would have an instigating force for such activation. Jenkins (2017a) reminds that the term "projective", when applied to the TAT and similar tests, is often regarded as non-objective and intuitive, defending that the term "narrative assessment techniques" (p. 227) is used. By doing so, other approaches could benefit from adopting TAT cards. The same author mentions the most validated developments in this direction, such as Phebe Cramer's Defense Mechanism Manual (Cramer, 2004;, Drew Westen and colleagues' Social Cognition and Object Relations Scale (SCORS; Westen et al., 1990), and the more recent SCORS-G (Stein et al., 2015;Stein & Slavin-Mulford, 2018). Even though other approaches also have recent validity evidence in literature (see, for example, Jenkins, 2008;Annotti & Teglasi, 2017), these two systems rely on coding manifest (rather than latent) story content, which reduces the risk of subjectivism, in its turn, a common source of criticism for non-psychoanalytic researchers.
The DMM is based on the psychoanalytical concept of defense mechanisms and measures the presence and intensity of three of such mechanisms (denial, projection, and identification). For doing so, content is coded by counting the presence of such mechanism's elements on manifest story content. Thus, the presence of such elements informs on the level of these mechanisms, a procedure with extensive empirical test of validity and reliability (see Cramer, 2015, for a review on such evidences). SCORS-G is the third version of the original SCORS, assessing eight dimensions (plus a global scale), namely: Complexity of Representation of People (COM), Affective Quality of Representations (AFF), Emotional Investment in Relationships (EIR), Emotional Investment in Values and Moral Standards (EIM), Understanding of Social Causality (SC), Experience and Management of Aggressive Impulses (AGG), Self-Esteem (SE), and Identity and Coherence of Self (ICS). The rationale for such scales derives from contributions from both psychoanalytical and social cognition theory and research, initially developed by Westen (1991), and proposed as an integrative approach. Recently, studies on SCORS-G include a Journal of Personality Assessment's special section (Stein & Siefert, 2018), as well as a book describing theoretical assumptions and supporting empirical data (Stein & Slavin-Mulford, 2018).
In summary, TAT can be considered a performance-based instrument, whose responding is considered complex and informative on several variables related to personality. In this sense, the validation of such instrument should refer not only to the systems found in literature, but on how the instrument's stimuli are expected to function (in this regard, see, for example, Cramer, 2017;Keiser & Prather, 1990;Scaduto, 2016;Schwartz & Caride, 2004a;2004b;Siefert et al., 2016). One issue in this regard is the definition of TAT as a psychological test, which is of particular interest for its use by Brazilian psychologists. We discuss this issue in the next topic.

TAT (AND ITS VARIATIONS) AS A PSYCHOLOGICAL TEST
The definition of the TAT as a psychological test (or, at least, a method or technique) is especially important in Brazil, where the professional legislation of psychological practice states that the use of psychological methods and techniques is an exclusive attribution of such professionals (CFP, 2018). Also, the definition of TAT as a psychological test implies that, for allowing its applied use, empirical evidence must be provided for the country's sociocultural context. Finally, the importance of such definition relates to the concept of psychological tests as stated by Urbina (2007), which is, as systematic procedures for obtaining behavior samples related to cognitive or affective functioning, which are compared to certain patterns. Therefore, the definition of TAT as a psychological test must be precise for the discussion proposed herein.
In this sense, we advocate for considering TAT a test as long as its use refers to its 20-card application, as stated in its original manual (Murray, 1943(Murray, /2005. Also, TAT should be considered a test only if some performance patterns (i.e., norms) are available. It is important to note that, until the present moment, TAT's original manual (Murray, 1943(Murray, /2005 is the only version of the instrument approved for use by professionals in Brazil by the CFP's Assessment System of Psychological Tests (Sistema de Avaliação dos Testes Psicológicos [SATEPSI]; CFP, 2018), even though such manual provides vague normative data. Other apperceptive thematic tests included in SATEPSI are Leopold Bellak's Children Apperception Test, in both versions (CAT-A and CAT-H; respectively, Marques et al., 2013a;2016), and the same author's Senior Apperception Test (Marques, et al., 2013b).
Compared to Bellak's CAT-A/H and SAT, TAT has more cards and its original use consists of showing 20 cards for assessed persons (Murray, 1943(Murray, /2005. Such arrangement is widely regarded as non-sensible and too long for concise assessment processes (see, for example, Aronow et al., 2001). When regarding literature on TAT, most of the studies mentioned in the next paragraphs used subsets of TAT cards, and not its complete version, even though they do offer validity evidences (Meyer, 2004;2017a;Siefert et al., 2016).
It is important to note that variations in the use of the original application of the instrument are justifiable, considering its long duration. However, we did not find studies that used subsets of TAT cards with empirical support for choosing some cards, other than referring to manifest card content (see, for example, Annotti & Teglasi, 2017;Aronow et al., 2001; for a critic on this issue, see Keiser & Prather, 1990;Siefert et al., 2016;Vane, 1981).
Although such studies could inform on shorter versions of TAT, the lack of standardization compared to the original application does not offer systematic evidence on the instrument's properties (Keiser & Prather, 1990). In addition, several authors developed alternate cards or new card sets, so that Jenkins (2008;2017a) refers to them as thematic apperception techniques, and leaving the term "Thematic Apperception Test" for the original cards developed by Murray (1943Murray ( /2005, which is copyrighted. However, when regarding the original TAT, there is no recent validity evidence (at least, based on large studies) for alternate versions of the instrument, at least to date.
In conclusion, although TAT is defined as a psychological test in Brazil, there is no recent evidence for supporting its validity in the country (for a review of such studies, see Lelé, 2018;Scaduto & Barbieri, 2013;Scaduto, 2016). Also, although the instrument's 20-card form is not sensible, due to its length (Aronow et al., 2001;Cramer, 2004), there is no recent evidence that supports the use of reduced forms.

MAIN EXAMPLES OF TAT VALIDATION STRATEGIES
Besides DMM and SCORS-G, mentioned when we described TAT theoretical foundations, the work of the Parisian (or French) school of TAT is an example of sound validation of a TAT system. In such school, a subset of cards is used, whose choice is based on the psychoanalytical theory and the clinical experience of such school's pioneer authors (Lelé, 2018;Verdon et al., 2014). Although such authors refer to the importance of norms for comparing categories' frequencies, the emphasis of the Parisian system lies on idiographic data, especially along with the Rorschach. Such validation is on the same direction as what Tavares (2003) described as clinical validity, which consists of the enhanced value of isolated instruments or techniques for clinical purposes, when they are adopted together.
The concept of clinical validity is an alternative for reducing the gap between clinical and research approaches in Psychological Assessment. Although this debate has seen important arguments recently (see, for example, Jenkins, 2014;2017a;2017b;Barbieri, 2008), a consensual solution for such gap is not yet consolidated in Psychological Assessment literature. Therefore, to the date, without empirically testing the properties of a TAT-cards subset, criteria for accepting such choice are valid only at theoretical assumptions' level.
In the same direction, it is important to remind that an idea such "TAT validity" makes no sense, either for this instrument or for any other in Psychological Assessment. Validity refers to interpretations drawn from data (in the case of TAT, interpretive systems), and not the test itself (American Educational Research Association et al., 2014;Urbina, 2007). It is important to note that such argument allows to deal with the challenge described earlier for the Parisian School of TAT, as well as similar approaches.
Therefore, the following discussion will refer to systems and measures that derive from the use of TAT cards, and not cards themselves. In this sense, it must be noted that it is possible to compare TAT cards only in terms of general performance indicators, as cards were built to cover different issues in personality functioning (Murray, 1943(Murray, /2005. Although such assumption justifies the adoption of cards' subsets, research on the issue should show empirically-based criteria for doing so, rather than justifying their choices based only on the premise of Murray (1943Murray ( /2005. Alves (2006) discussed empirical evidence and theoretical arguments on the validity of projective techniques, with an emphasis on the TAT and the Human Figure Drawings. She pointed out the complexity of such validation, reminding the argumentation by Anzieu (1981), who asseverates that the validation of projective techniques is more of a hypothesis than an instrument-related one, thus requiring a more complex research program. Primi et al. (2009) reaches a similar conclusion, stating that the validation of a test implies in objectifying a psychological theory, and checking for the correspondence of observed facts with theoretical expectations. In this sense, such correspondence is not necessarily tied to a specific source of validation evidence, but is based on the use of several research resources. This difference is important when regarding psychological testing, as scientificity is assured provided data fits the assumptions of a particular measuring model, being it Classical Test Theory (CTT) or Item Response Theory (IRT), for example. As we discuss in the next topics, the psychometric evaluation of TAT data has historically led to the conclusion that several TAT systems are not valid (for a review of such critique, see Jenkins, 2017b), which may refer more to an incorrect adoption of such models than limitations of such systems.

NOMOTHETIC AND IDIOGRAPHIC LEVELS OF TAT DATA
When regarding TAT nomothetic data, studies from the last two decades adopted both a traditional psychometric approach, as well as a critical attitude toward it. Blankenship et al. (2006) and Tuerlinckx et al. (2002), for example, showed that it is possible to carry out studies with TAT cards and/or other thematic apperception stimuli using IRT techniques, for the measurement of motives. In the same direction, recent advances in the measurement of psychodynamics constructs, also inspired by other theoretical models, have been demonstrated, such as DMM and SCORS-G, which were previously mentioned in the topic on TAT's theoretical foundations. Such examples show that it is possible to develop both idiographic and nomothetic approaches as well, with their respective possibilities and limitations. Although a myriad of coding and interpretation systems is documented (Alves, 2006;Jenkins, 2008), DMM and SCORS/SCORS-G are the most cited approaches on the TAT validity debate, due to their large empirical and psychometric evidence (Alves, 2006;Cramer, 1999;Meyer, 2004;Stein & Slavin-Mulford, 2018). Tavares (2003) states that the TAT is better regarded as an idiographic approach to personality functioning; in the same direction, Jenkins (2017b) discusses the utility of TAT norms, in terms of their low generalization, suggesting that nomothetic approaches to the instrument make little sense. It is important to note that Jenkins (2017b) also states that research should look for statistical significance of TAT measures, even if based on studies with few participants, so that further effort on its validity can be demonstrated. Such apparent contradiction indicates that nomothetic and idiographic levels on the TAT remain an issue, at least to the (false) assumption that interests of researchers and practitioners are non-complementary.
When emphasizing the idiographic nature of TAT, authors point out the instrument's value when used with other assessment techniques (Annotti & Teglasi, 2017;Jenkins, 2014;2017a;Tavares, 2003). In the same direction, the value of TAT in psychological assessment procedures is largely sustained from clinical practice (Jenkins, 2008;2017b;Silva, 2011;Tavares, 2003;Verdon et al., 2014). Nevertheless, a great extent of the criticism on interpretive systems for this instrument refers to their poor psychometric quality, and lack of nomothetic data for their support.
Such arguments remind that the debate on TAT's potential and limitations can render blurred, if the specificities of different fields are put in the same level. In this sense, it is important to note that much of the debate would be more productive if it focused on different aspects of TAT's properties as a projective method, rather than the intrinsic value of this test (or similar ones). Such confusion seemed to maintain a vision of TAT as stated by Vane (1981), that is, the clinician's delight and the statistician's nightmare. We understand that this allegory is informative at its core -these two professionals have different interests and practices, even though they are undoubtedly related.
In this sense, clinicians will tend to asseverate the value of TAT for describing idiographic data, which contributes for higher quality on Psychological Assessment procedures (Annotti & Teglasi, 2017;Jenkins, 2014Jenkins, , 2017a2017b;Tavares, 2003). On the other hand, those who tend to emphasize the importance of psychometric indicators, as traditionally conceived in Psychological Testing (Nunnally, 1978, for example), will tend to remind of TAT's weak indexes, obtained in studies using similar psychometric methodologies than for self-report techniques (for a review on such criticism, see Cramer, 2004;Jenkins, 2017a, for instance). This apparent contradiction refers to different expectations from projective techniques, which refer to the difference between idiographic and nomothetic approaches in psychological research and practice, in its turn based on different epistemological assumptions and objectives.
Clinicians will be interested in the contribution of such techniques for comprehension and decision making, along with other sources of information, which refer to Psychological Assessment. Psychometrists will be interested in specific properties of each instrument, which, in its turn, refer to Psychological Testing. Primi (2012) showed that such approaches attend to specific needs (respectively, professional practice and research), which are related in indirect ways -professional practice must be based in research, but application of conclusions drawn from nomothetic data are not as straightforward as desired by those unaware of the need to contextualize such data when considering individual cases. It must be noted that integrating idiographic and nomothetic is a constant need (and challenge) when considering individual cases (which is the usual applied use of research data). However, such task does not exclude the need for showing the adequateness of instruments such as projective ones in terms of methodologically sound research (which is the usual practice of researchers, rather than applied professionals). Haase et al. (2010) discuss the use of nomothetic and idiographic approaches in neuropsychological assessment, showing that both have limitations. Namely, nomothetic approaches can lack evidence of construct validity (although quantitative methods are available in this direction, such as factor analysis), for the specificity of measures that derive from nomothetic research demand contextualization that such approach cannot provide. On the other hand, idiographic approaches demand the consideration of many information sources (i.e., interview, testing, observation), in order to test hypothesis and plan interventions best tailored for each case. In order to deal with such limitations, Haase et al. (2010) defend the complementary use of both approaches, so that hypothesis testing can rely on the comparison with typical performance indicators, which can be contextualized with the integration of other information sources.
The same rationale is valid for projective methods in general and TAT in particular. Although different research and practice traditions tend to emphasize the importance of one approach (in detriment of the other), we advocate that these levels of validity are different and inform, in the case of nomothetic approaches, about regularities of performance, which helps understand its idiographic characteristics.
As pointed out by Urbina (2007), Psychological Assessment and Psychological Testing differ in terms of objectives and data treatment, being thus different (and not directly comparable) contexts of research and practice, although highly related and interdependent. It is important that such argument does not lead to reinforcing an already worrying separation between such contexts, such as observed in general clinical practice and research, whose mutual, harmful effects have been discussed by Barbieri (2008) and Jenkins (2008;2017b). In this sense, for Psychological Assessment (and idiographic approaches), projective techniques' data is one source of information, among others who will be complementary, which claims for research on the properties of such procedures. In the case of TAT, efforts have already been made for empirical data in this direction (Annotti & Teglasi, 2017), but, to the date, few studies covered such issue (Jenkins, 2014;2017b;Tavares, 2003). On the other hand, Psychological Testing claims for research on instruments' properties, a design in which the role of different resources for decision-making is not usually an issue. Although recent research on TAT has covered such aspects, data from both kinds of research are necessary for a wider appreciation of this instrument's value (in this sense, see Jenkins, 2017a, who describes several suggestions for better research on TAT in general).
The urge for well-designed research is especially high in Brazil, considering the lack of recent studies about the instrument in the country. At the same time, although international research offers a more positive panorama in terms of evidence on the instrument's possibilities (see, for example, Annotti & Teglasi, 2017;Jenkins, 2008;Stein & Slavin-Mulford, 2018), some flaws persist, reminding the questions formulated by Keiser and Prather (1990), who asseverate that information is scarce on what exactly each TAT card assesses.
An argument for such problem is that validity refers to conclusions drawn from data (in the present case, TAT interpretive systems), so that stimuli properties would be a secondary issue. However, without a clear account on such properties, the extent to which data can be explored may be unclear. In order to deal with this problem, recent efforts showed the importance of card properties to elicit content that refer to an interpretive system's construct of interest (Cramer, 2017;Siefert et al., 2016).
Another option for describing card properties are normative studies, which can provide evidence on the properties of each card, and possible effects related to use of card subsets other than the original application described in Murray (1943Murray ( /2005. Recent studies do not seem to address such issues, which could inform on the limitations and possibilities of using TAT cards as a test, and/or its stimuli, whether alone or in specific sets. With the exception of the study by Ávila-Espada (2000) in Spain and some efforts in Argentina (Schwartz & Caride, 2004a, 2004b, no recent normative data was found in literature, which demands such efforts, especially in Brazil, where normative studies provide the basis for the inclusion of an instrument in SATEPSI (CFP, 2018).
On the other hand, Jenkins (2017a) offered important arguments for questioning the usefulness of TAT normative data for clinicians. She states that norms for the instrument relate to interpretive systems, rather than for its cards; also, such data would have only narrow use on cutoffbased decisions for psychopathology, for example, a level for which TAT would be inappropriate. Finally, Jenkins (2017a) remembers that TAT normative data has low generalization, due to cultural specificities for which storytelling is highly sensitive. Such arguments are reminders for the risk of misuse of such data; however, we advocate that developing norms for TAT is a valid effort, with advantages other than for simple, quantitative-based diagnostic classification.
Norms do not refer to the idiographic level, but for comparison with similar people than the one under assessment. We agree that, in the case of TAT, such comparison is of little use for the constructs commonly using such instrument. However, before such assessment, it is important to know about the typical performance on each card, especially for broad, multidimensional interpretive systems such as the ones described in Verdon et al. (2014), Murray (1943Murray ( /2005 and Scaduto (2016). Such performance should comprise formal aspects of storytelling, such as details' omission or distortion, common themes, detail level, word count and average time of narratives. Although such indicators alone are of little use or even misguiding, as stated by Cramer (2004) for word count, it is their configuration that can help understand individual performance, in relation to similar persons' one.
The comparison of a person's performance to norms should inform about how this person relates to the culture and demographics (i.e., age, gender, educational level) he or she is part of. In this sense, we advocate for developing TAT norms that are local and culture-related, a possibility praised by Jenkins (2008;2017a) as a valid strategy. At the same time, empirical data on typical performance can help update clinicians' impressions, especially when based on the observation of clinical groups only. Scaduto (2016), for example, observed that some of the apperceptive omissions or distortions considered as clinically significant by Murray (1943Murray ( /2005 are in fact common among persons from a non-clinical sample in Brazil. Such observation (derived from a normative study) calls for reconsidering details that make difference in inference making, in which TAT data offers valuable information.

PSYCHOMETRIC PROPERTIES OF TAT: THEORETICAL AND RESEARCH-RELATED LIMITATIONS
Although studies show the possibilities of sound measures derived from TAT (Ávila-Espada, 2000;Cramer, 2015;Stein & Slavin-Mulford, 2018), some authors (Cramer, 2004;Holt, 1999;Jenkins, 2017b) problematize the use of traditional psychometric resources as the only way to assess and demonstrate validity and reliability. In summary, critical studies on TAT psychometric validation defend that validation and response process on projective techniques work differently than for self-report ones (Anzieu, 1981;Cramer, 2004). They also remember that traditional psychometric (in this case, CTT) techniques depend on assumptions that do not apply directly to projective techniques. More specifically, classic reliability estimation methods (retest, split-half, parallel forms, internal consistency) do not apply to TAT, due to its nonlinear or item-oriented structure, as expected in studies that considered TAT cards as items (Alves, 2006;Cramer, 2004;Jenkins, 2017a;see Hibbard et al., 2001, though, for an important defense of showing internal consistency for SCORS).
Jenkins (2017b) and Tuerlinckx et al. (2002) discuss the reliability of measures derived from TAT, stating that its cards were not developed with internal consistency in mind, which is, they were planned for covering a wide range of situations, not necessarily related among them along all cards. In other words, some measures along TAT cards seem to have low internal consistency because they do not relate well among themselves, but relate strongly to personality constructs, as they cover such construct's multidimensionality. Research on the cards' ability to elicit different (and not related) responses (which is referred as card pull; Cramer, 2017;Siefert et al., 2016) endorses such view, advocating for the pertinence of choosing (and justifying) card sets. Jenkins (2017b) and Tuerlinckx et al. (2002) also remember that internal consistency is related to number of items and is affected by what CTT defines as random measurement error, which is a condition based on a premise of construct stability. Such premise must be considered contextually in the case of personality, as understood by dynamic theories (i.e., Verdon et al., 2014), and considering TAT cards' different pull, so that simplistic, direct estimation of alpha can lead to values that do not adequately measure internal consistency. An alternative for such apparent problem is the development of specific constructs within personality, as showed by Hibbard et al. (2001) and for constructs whose assumptions of stability are met (see, for example, Annotti & Teglasi, 2017).
In the same direction, Cramer (2004) argues that traditional psychometric assumptions to reliability, namely, trait immutability (when considering test-retest conditions) and internal consistency (using Cronbach's alpha), do not fit for TAT, as cards are intentionally not overlapped, which is a condition for homogeneity. However, Cramer (2004) did not consider contemporary discussions of reliability and internal consistency. Revelle and Zinbarg (2009), for example, stated that reliability refers to the correlation between two (ideally) identical tests. In the absence of such condition, such correlation can be estimated from the internal structure of a test. The same authors suggest that reliability measures derived from factor analysis (i.e., factor loading indexes) offer a better appraisal on such property than alpha. Thus, reliability does not depend on the assumption of trait immutability, but rather on how well items relate to a construct. Revelle and Zinbarg (2009) also note that reliability based on Cronbach's alpha presents several problems, even though it is still widely adopted. Regarding the TAT, Lundy (1985) showed that, on a test-retest condition, correlations were acceptable when participants were instructed to not necessarily produce a new story for the same cards. Lundy (1985) also showed that alpha values in that condition were lower than test-retest correlation, a non-expected result for CTT assumptions regarding alpha (for example, Nunnally, 1978). Jenkins (2017a;2017b) argues on the alleged low reliability of measures across cards, stating that storytelling is a different task than responding to an item. Consequently, one cannot assume regularities in constructs such as motivation intensity or persistence of preoccupations, due to individual variation, as well as cards' variation of the situations and details they display. On the other hand, more stable constructs will tend to show better reliability (Annotti & Teglasi, 2017;Hibbard et al., 2001). Jenkins (2017b) also remembers that different TAT measures consist of either scales or indexes. While scales consist of correlated items in terms of an assumed similar effect on trait expression's eliciting, indexes consist of not necessarily correlated (or equivalent) items that, together, define the construct due to its high content validity (for example, socioeconomic status defined by educational level, occupation status, income, and residential area; Jenkins, 2017b). Such difference leads to different strategies for evaluating the soundness of a measure, that relate to the definition of the construct and its composing indicators, as well as their relation (Fried, 2017).
Compared to self-report techniques, projective ones tend to approach personality in a more dynamic way, that is, in which several dimensions work together for explaining behavior, and the configuration of such aspects tend to be more informative than the presence/absence or the intensity of such aspects considered separately (Verdon et al., 2014). Also, for dynamic theories of personality such as the ones who originally inspired the creation of projective techniques, isolated dimensions make little sense alone for explaining behavior. Instead, such theories aim to explain complex levels of personality functioning, in AA Scaduto, V Barbieri & MA Santos terms of multivariate (rather than unidimensional) models. Although multidimensional psychometric models are well established in literature, such as Confirmatory Factor Analysis (Brown, 2015), IRT methods (see, for example, Ackerman, 2005;Hartig & Höler, 2009;Reckase, 2009), Network Analysis (Schmittmann et al., 2013) and Structural Equation Modeling (Kline, 2015), studies with TAT using such methods do not exist to the moment.
In order to deal with this complexity, an alternative for validating projective techniques could be developing simpler, unidimensional versions of such instruments (Anzieu, 1981). This is the case of TAT, from which more specific measures, other than multidimensional systems such as Verdon et al. (2014) and Murray (1943Murray ( /2005 have been developed (see Cramer, 2015;Stein & Slavin-Mulford, 2018, for example). Also, several alternative cards and sets have been developed, with important advances for assessing more specific constructs and populations, with the main examples being the Contemporized-Themes Concerning Blacks Test (Hoy-Watkins & Jenkins-Monroe, 2008) and the Tell-me-a-Story test (Costantino et al., 2007). It is interesting to note that such versions offer an alternative for TAT cards, whose historical marks (i.e., the "old" and "dark" aspects, as well as portraying typical situations of USA's decade of 1930-40) have been subjected to criticism (Parada & Barbieri, 2011;Jenkins, 2017a;2017b). Regarding the characteristics of the original TAT cards, two studies showed that, in Brazil, such marks seemed not to affect performance, when regarding the "old" aspect. Silva (1989) did not observe differences in stories' characteristics among groups with "old" and "modern" cards, while Scaduto (2016) reported that, on average, participants mentioned the cards seemed old in less than 5% of the cases, for 12 of 20 presented cards.
The discussion on the inadequacy of simplistic psychometric analyzes is not exclusive of research on projective techniques. Several authors claim that, without a clear understanding of psychometric concepts and their underlying statistical models, the mere use of quantitative estimations can become alienated, due to an excessive consideration of such strategies alone as the only estimations of an instrument's reliability and validity. In this sense, Damasio (2012) and Gouveia et al. (2009) advocate for a critical analysis of sample, type of measure and whether assumptions of statistical models are satisfied before adopting specific estimators, instead of using them just because of their popularity among researchers.
In the same direction, Gouveia et al. (2009) andPasquali (1997) also remind that statistical tools are indexes of theory's adjustment to observed data (i.e., model fit), but the choice of which tools to use is determined by psychological theory. Without such care, measures can offer a false image of objectivity and scientificity, an aspect widely discussed and criticized along the quantitative-qualitative debate (in this sense, see Gelo et al., 2008, for instance). In the case of TAT, Jenkins (2008;2017a;2017b) reminds that more carefully designed research can display a more accurate picture of TAT and similar techniques' properties, especially in regards of construct definition, variable modeling and control, so that more detailed numerical analyzes can be performed, even for small samples.

TAT VALIDITY IN BRAZIL: CHALLENGES FOR FUTURE RESEARCH
In the present study, we proposed to review arguments on how to improve the quality of research and practice on TAT, for both improving and encouraging its use. Considering CFP's policies for approval of an instrument's use by practitioners, the topics above covered constructs that TAT cards can cover, as well as international research on such systems' validation, given the particularities of research on validation of projective techniques.
Regarding the research on the instrument in Brazil, though, limitations are more frequent than potentialities, namely, the small number of researchers involved with TAT research, the established perception of the usefulness of the instrument in the country and the lack of conjoint efforts to overcome the challenges such research imposes. Recent research on TAT in Brazil is scarce, except for some efforts of ours and fellow researchers (Mishima-Gomes et al., 2014;Scaduto, 2016;Scaduto et al., 2015;Scaglia et al., 2018) and a recent article on validity evidences of the Parisian school (Lelé et al. 2014). However, such studies do not cover the issues discussed in the previous topics, especially on validation (Scaduto & Barbieri, 2013;see Scaduto, 2016, though, for an initiative in this direction). Also, the only analysis system for TAT available for practitioners (i.e., included in CFP's SATEPSI) is the original one (Murray, 1943(Murray, /2005, for which validity studies used small samples (Herzberg, 1993;Miranda, 2000;Silva, 1989).
In light of the present situation of research and practice with TAT in Brazil, research should benefit from setting an empirical database on the complete, 20-card form of the instrument, based on standardized instructions, so that researchers could study performance and content features of storytelling, as well as card-specific features, in the Brazilian population. As stated above, a previous initiative in this direction made by us (Scaduto, 2016) is, to date, an isolated effort. We understand that, although it is an effort for nomothetic data on the instrument, such database can provide data for better-informed analysis, to be explored in idiographic levels as well. It is important to note that such effort applies not only to Brazil but also abroad, as it can address the limitation of research on small samples, which is still the rule for TAT studies in general (Jenkins, 2017a;Scaduto & Barbieri, 2013;Scaduto, 2016). In this sense, we advocate that larger, multicenter databases, can provide data for more complex analyzes. An example in this direction is an ongoing, worldwide normative data collection on the Rorschach's R-PAS (Meyer et al., 2019). Also, efforts for larger databases on TAT would allow testing different interpretive systems, an issue barely explored in the scarce research on the instrument.
Researchers and practitioners should remind that projective techniques consist of using ambiguous stimuli or instructions for eliciting free, open responses, which is such techniques' basic difference from self-report ones (which use objective stimuli and responding). Both techniques will activate several psychological processes, although the result of some of these will be more determinant (or observable) in responding than others. In this sense, research on both techniques should offer an account on which processes are more likely to occur during responding, and in what conditions such occurrences will explain and predict behavior.
The same can be said about the false opposition between nomothetic and idiographic levels of TAT data analysis and research (Haase et al., 2010;Scaduto, 2016;Tavares, 2003). Such dimensions inform, on the nomothetic level, the extent to which an individual's behavior relates to his/ her cultural expectations (Jenkins, 2017b), and therefore, its formal aspect. On the other hand, the idiographic level describes behavior in terms of personal syntheses of cultural experience, and therefore, behavior content (Annotti & Teglasi, 2017;Jenkins, 2014;2017b;Scaduto, 2016). In this sense, considering and integrating such levels allows what winnicottians describe as filling a transitional level of the individual's relationship with the culture he/she is part of (Barbieri, 2008).
Considering the need of demonstrating the adequacy of instruments such as TAT for the investigation of personality characteristics, it is possible to say that, although much of the criticism on such instruments has been overcome in international literature, much is still to be done, especially in the Brazilian context. Although contrary positions to the CFP's policies on the use and inclusion of instruments in SATEPSI exist (Silva, 2011), a better regulation on such use provides opportunities for the discussion and improvement of research and practice, in order to diminish their gap. By facing the challenges of implementing sound research and practice with TAT, Brazilian psychologists (and not only them) will be able to perform a more ethical, scientifically grounded and clinical-lapidated practice.