AA AN INVENTORY OF THE CHARACTERISTICS OF THE MARKETING SCALES CREATED AND TESTED IN BRAZIL

The scales used in marketing research looking for obtaing reliability data. Howe-ver, a lot of problems exist when the topic is used a valid and reliable scale. In this context, this paper analyzes the brazilian instruments proposed and validate (just in marketing field). For such purpose, the theory in scale development is discussed (suggestin a summarized figure) and, as consequence, that theory is tested empiri-cally. A total of 26 scales were analyzed in the brazilian marketing field. The results suggested that it is necessary to use more the alternative methods suggested by the theory, such as confirmatory factor analysis and nomological validity.


ABSTRACT
The scales used in marketing research looking for obtaing reliability data. However, a lot of problems exist when the topic is used a valid and reliable scale. In this context, this paper analyzes the brazilian instruments proposed and validate (just in marketing field). For such purpose, the theory in scale development is discussed (suggestin a summarized figure) and, as consequence, that theory is tested empirically. A total of 26 scales were analyzed in the brazilian marketing field. The results suggested that it is necessary to use more the alternative methods suggested by the theory, such as confirmatory factor analysis and nomological validity.

INTRODUCTION
Scales have been created and evaluated for decades in marketing (ANDER-SON et al., 1987;CHURCHILL Jr., 1979). Churchill and Peter (1984, p. 360) comment that "marketing researchers have responded by making an impressive effort to develop and investigate the psychometric properties of new measures, as well as to investigate previously proposed measures of marketing constructs".
Scholars frequently need new scales for measuring new or old phenomena in marketing. Thus, the necessity of the construction and the test of new instruments for measuring these phenomena are very critical. Moreover, some measures are created with the goal of facilitating data collection, instead of measuring an object. Consequently, the results generated can be questionable.
Jacoby (1978, p. 91) goes beyond the problem of measuring faithfully and comment that "most of our measures are only measures because someone says that they are, not because they have been shown to satisfy standard measurement criteria (validity, reliability and sensitivity)". What does Jacoby's argument mean?
It means that "if a finding is significant or that the ultimate in statistical analytical techniques have been applied, [we can doubtful of it, because] the data collection instrument generated invalid data at the outset" (JABOBY, 1978, p. 90).
As a result, marketing scholars need to review, to test, and to evaluate constantly their scales, if progress in marketing theory is to be achieved. Based on this context, this paper has a threefold goal: to present a theoretical framework for helping and supporting scales assessment, to analyze the features of the marketing scales proposed by Brazilian researchers, and to analyze the features of the international marketing scales psychometrically tested in Brazil.
This investigation is justified based on some arguments: (i) international and national researchers could use the paper as a guide for finding rapidly some scales for their research, and (ii) the framework proposed here could help researchers in understanding the theoretical concepts behind scales structures.
The paper is structured as follow. First, it discusses the theory behind scale assessment. In other words, an overview of the psychometric foundations of scales is presented. Second, the methodology used in the empirical part of the investigation is discussed, which is evaluated Brazilian marketing scales published in top journals. Next, it presents the three main contributions of the paper, which are: the framework proposed for assessing and helping the scales validity, the Brazilian new scales proposed to the marketing theory, and the international scales psychometrically tested in Brazil. Then, the results are presented and it is closed with discussions on the data, paper contributions, and descriptions for future research.

MEASURES INTRODUCTION
According to Peter (1979, p. 6) "[v]alid measurement is the sine qua non of science. In a general sense, validity refers to the degree to which instruments truly measure the constructs which they are intended to measure". The instrument relevance is so notorious that "it is clear that if measurement is disregarded in marketing research, the field will be slow to advance" (RAY, 1979, p. 1). Parameswaran et al. (1979, p.18) comment that marketing scholars "are urged to pay more attention to data [measurement] because theory construction is a product of the interaction between data and models." Thereby, these authors annotate that there are three basic requirements of measurement. First, measurement must be an operationally definable process. Second, measurement should be valid. Third, the outcome of the measurement process must be reproducible. However, what most observers do not recognize beyond these two requirements is that measurement development is not only a scientific requirement, but also a practical necessity (RAY, 1979).
It is recognized that a good scale should present a minimum of features for reaching a strong validity. Some of these characteristics are more important than others, and therefore might be discarded. However, for achieving a theoretical validity and reliable, most of the characteristics must be followed. The next topic discusses the most important features and concepts for assessing scale validity. Researchers should follow them in order to accomplish a better instrument.

SCALE ASSESSMENT
This part of the paper looked for explaining those different types of assessment available in the theory. The theory available was reviewed and it is presented in a simple framework that puts all concepts in their respective places. Future researchers might find the suggested framework useful, since it visually classified some concepts more adequately and provide a guide/support to their research. Figure 1 presents the framework and its respective subdivisions. The concepts are classified mainly as (a) reliability, (b) validity, (c) generalization and (d) applicability:

TEST-RETEST RELIABILITY
In this part of scale construction, the researcher should analyze the correlation between the same person's score on the same set of item at two points in time (generally separated by two or four weeks). Churchill Jr. (1996, p. 405) says that "evidence of the reliability of a measure is determined by measuring the same objects or individuals at two different points in time and then correlating the scores", which is also know as test-retest reliability assessment.

ALTERNATIVE FORM
In this part, it is necessary to build two alternative (but equivalent) forms of the scale and apply it in the same sample twice (in an interval of two or four weeks [MALHOTRA, 2001]

INTER-RATER OR INTER-OBSERVER RELIABILITY
Used to assess the degree to which different raters/observers provide consistent estimates of the same phenomenon. There are two major ways to estimate inter-rater reliability. Trochim (2002) comments that the first is "if your measurement consists of categories -the raters are checking off which category each observation falls in -you can calculate the percent of agreement between the raters". The second major way to estimate inter-rater reliability is used when the measure is a continuous one. According to Trochim (2002), this means that there, all you need to do is calculate the correlation between the ratings of the two observers. For instance, they might be rating the overall level of activity in a classroom on a 1-to-7 scale. You could have them give their rating at regular time intervals (e.g., every 30 seconds); the correlation between these ratings would give you an estimate of the reliability or consistency between the raters.

INTERNAL CONSISTENCE RELIABILITY
It is normally expected that items composing a scale show high levels of internal consistency. In this context, some commonly used criteria for assessing internal consistency are: individual corrected item-to-total correlations, inter-item correlation matrix for all scale item or items proposed to measure a give scale dimension, and number of reliability coefficients, such as coefficient alpha (BEARDEN;NETEMEYER, 1999, p. 4). These criteria are explained follow.
1. Split Half reliability. According to Trochim (2002), In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We administer the entire instrument to a sample of people and calculate the total score for each randomly divided half. The split-half reliability estimate is simply the correlation between these two total scores.
2. Cronbach's Alpha. It is mathematically equivalent to the average of all possible split-half estimates, although that is not how we compute it. Malhotra (2001, p. 264) defines as a "reliability measure of internal consistence that is the average of all possible estimates resultant of the different separation/ division of the scale in two halves." A low coefficient alpha indicates the sample of items performs poorly in capturing the construct, which motivated the measure. Conversely, a large alpha indicates that the k-item test correlates well with true scores (CHURCHILL Jr., 1996, p. 68). 3. Item-total. A recently used rule of thumb for corrected item-to-total correlations is that they should be 0,50 or greater to retain an item (BEARDEN et al., 1989;SHIMP;SHARMA, 1987). 4. Item-Item or Inter-Item. Rules of thumb for individual correlations in the inter-item correlation matrix vary (see, for instance, JARVIS and PETTY, 1996).

CONTENT VALIDITY
On the surface, the scale items should appear consistent with the theoretical domain of the construct (BEARDEN; NETEMEYER, 1999). Since there is no formal statistical test for content validity, researcher's judgment, and insight must be applied (GARVER; MENTZER, 1999, p. 34). Content Validity is also called Face Validity. In practice, content validity is not assessed by statistical means; rather it is sought through the use of a representative collection of items and a sensible method of test construction.

CRITERIUM VALIDITY
It refers to the success of the instruments used in order to obtain estimation or forecast (COOPER; SCHINDLER, 2004).
1. Predictive Validity. It consist of determining the extent to which particular measures predict other criterion measures, so it has much pragmatic or managerial meaning in marketing (HELER; RAY, 1972, p. 361). Sometimes it is also called Criterion-Related Validity or Pragmatic Validity. According to Churchill Jr. (1996, p. 402) it is the "usefulness of the measuring instrument as a predictor of some other characteristic or behavior of the individual." 2. Concurrent Validity. In concurrent validity, the operationalization's ability to distinguish between groups, that theoretically should be distinguishable, is assessed. For example, if we come up with a way of assessing manic-depression, our measure should be able to distinguish between people who are diagnosed manic-depression and those diagnosed paranoid schizophrenic. If the objective is to assess the concurrent validity of a new measure of empowerment, we might give the measure to both migrant farm workers and to the farm owners, theorizing that our measure should show that the farm owners are higher in empowerment (TROCHIM, 2002).

Construct Validity
Assessment of how well the instrument captures the construct, concept, or trait that it is supposed to be measuring; [thus] a measuring instrument designed to measure attitude would be said to have construct validity if it indeed measured the attitude in question and not some other underlying characteristic of the individual that affects his or her score (CHURCHILL Jr., 1996, p. 404).
The construct validity divisions are: 1. Discriminant Validity. According to Churchill Jr. (1979, p. 70) it is the extent to which the measure is indeed novel and not simply a reflection of some other variable. The researcher examines the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should be not similar.
To Heeler and Ray (1972, p. 363) discriminant validation is done by comparing the convergent validities with: (1) correlations between different traits measured by different methods and with (2) correlations between different traits measured by the same method.
2. Convergent Validity. It is the extent to which it correlates highly with other methods designed to measure the same construct (CHURCHILL Jr., 1979, p. 70). In other words, convergent validity is the degree to which multiple measures of the same construct demonstrate agreement or convergence (MARSH et al., 2002, p. 94). It should be noted that Multitrait Multimethod Matrices have often been used, as a methodology, to assess convergent and According to Churchill Jr. (1996, p. 538), an instrument has nomological validity if it "behaves as expected with respect to some other construct to which it is theoretically related." 4. Know Group Validity. Know group validity asks the question "Can the measure reliably distinguish between groups of people who should score high on the trait and low on the traits?" (BEARDEN; NETEMEYER, 1999). For instance, a person who is truly conservative should score significantly higher on a conservatism scale than a person who is liberal; likewise, a salesperson in the retail car business and a salespeople the large computer business should differ in their levels of customer orientation (SAXE; WEITZ, 1982).

(C) GENERALIZATION
Can the results of a research be generalized? Generalization indicates whether researchers believe in the possibility of generalization of the results to a bigger sample (MALHOTRA, 2001).

(D) PRACTICAL APPLICABILITY
It means that the instrument should be economic, convenient, and liable to interpretation (COOPER; SCHINDLER, 2004). Economic means that the instrument should display a balance between number of items (high reliability) and time to fill-in. Convenience means that the instrument should be clear, detailed in its instructions, clear in its concepts and easily managed. Interpretation implies that other people, besides the author, must understand the survey results.

CONFIRMATORY FACTOR ANALYSIS (CFA)
To Baggozi et al. (1991, p. 429): Volume 8, n. 4, 2007, p. 11-34 CFA model allows methods to affect measure of traits in different degrees and to correlate freely among themselves, such as: (1) measures of the overall degree of fit are provided in any particular application (e.g., the chi-squared goodness-of-fit test), (2) useful information is supplied as to if and how well convergent and discriminant validity are achieved (i.e., through chi-squared difference tests, the size of factor loadings for traits, and the estimates for trait correlations), and (3) explicit results are available for partitioning variance into trait, method, and error components (i.e. through squared factor loadings and error variance).

Multitrait-Multimethod Matrix (MTMM) 1
It is another option of assessing construct validity. The Multitrait-Multimethod Matrix is the correlation matrix for different concepts (traits) in which each concept is measured by a different method (BAGOZZI; YI, 1991). According to Heeler and Ray (1972, p. 363), the procedure simply consists of a matrix of correlations between several variables or concepts (traits) each measured by several techniques (methods) [and] such a matrix can provide the basis for an examination of both convergent and discriminant validation.

UNIDIMENSIONALITY
It is defined as the existence of one construct underlying a set of items HUNTER, 1987). Thus, unidimensionality is the degree to which items represent one and only underlying latent variable (GARVER; MENTZER, 1999, p. 35).
In sum, the single framework is an attempt to present most of the different types of scale assessment available in the theory. As this paper has the goal of analyzing the marketing scales created and tested in Brazil, according to some criteria, these criteria need a detailed explanation and, therefore, this structure presented in Figure 1 could help in visualizing the concepts. 1 Bagozzi et al. (1991, p. 422) comment that "there are at least ten different procedures proposed for the analysis of MTMM matrices, each built on a different set of assumptions and each is appropriate only under certain circumstances."

PRODUCTION: A HISTORICAL EVOLUTION
Particularly, the Brazilian marketing field was the object-of-analysis of other meta-studies. It is important to discuss what was done in favor of Brazilian marketing progress. This topic analyzes the papers that represent the state-ofthe-art in Brazilian marketing meta-studies. The six papers in this discussion are: Vieira (1998,1999,2000), Perin et al. (2000), Botelho and Macera (2001), and Brei and Liberaldi (2004). First, Vieira (1998 wrote a trilogy looking for evaluating the marketing scientific production. Vieira (1998) measured all marketing papers published in EnANPAD from 1990 to 1997, according to some variables, such as theme, number of authors, academic affiliation, number of references per article, total number of journals quoted, total number of international journals quoted and total number of Brazilian authors quoted. Vieira's results, after 3.208 bibliographic references reviewed, showed that the EnANPAD marketing meeting, which is the major Brazilian congress, is not serving as a reference to Brazilian authors. It means that Brazilian researchers have been rejecting their own scientific reference.
Following the trilogy, Vieira (1999) analyzed the trends and research priority in marketing according to Brazilian researchers' opinion. Vieira results showed that the themes most researched were: Consumer Behavior, Marketing Strategy, and Marketing Service, in that order. In addition, his results showed that the marketing scientific production is localized in Universities such as UFRGS, USP and UFRJ. The authors used books more intensively than journals in preparing their papers; but when they used journals, the latter were primarily international.
In his last research, Vieira (2000) analyzed the qualitative profile of the Brazilian researcher. Vieira's results showed that the researcher is predominant male, with more than 15 years of experience. Articles were mainly pragmatic and the marketing scientific production is characterized as "insufficient and novice." Moreover, regarding the Brazilian top journals, the research showed that the most read is Revista de Administração de Empresas (RAE), but the most quoted is Revista de Administração (RAUSP). Perin et. al (2000) conducted a survey of the EnANPAD marketing papers of the 90's. The variables were research type, research design, questionnaire, constructs reliability and validity, results presentation, research problem and data analysis. His conclusions showed that the EnANPAD marketing papers quality is "questionable," mainly because of weak rigidity. Moreover, little attention was given in hypothesis, research model, variables validity, and so forth. Botelho and Macera (2001) analyzed the PhD and MBA marketing theses published at EAESP-FGV, which is one of best Brazilian business school and one of the oldest in Brazil to offer PhD and Master degree courses. 2 The variables analyzed were syntax, semantics, and pragmatics. The results showed that some criteria such as wealth, specification, and empiric support were presented faintly. Brei and Liberaldi (2004)  In sum, Vieira (1998,1999,2000), Perin et al. (2000), Botelho and Macera (2001) and Brei and Liberaldi (2004) presented interesting aspects of the Brazilian marketing scientific production, however, none specifically discussed the scales in Brazilian marketing papers. Thus, we hope to fulfill this gap by discussing the aspects neglected by the other authors.

METHODOLOGY
First, it is important to define the types of articles to be included in the study. It was decided to concentrate on articles addressing just Brazilian marketing research scales. As a result, articles focusing on the other fields of Business Administration (e.g. Organizations and Information Systems) were automatically excluded. If they all were included, there could be more than 70 instruments. 3 Using the guidelines proposed and discussed in Figure 1, the articles were framed inside the tables. Some variables were previously created for the classification. These variables are showed in Table 1 (Brazilian New Scales) and Table 2 (Scales replicated). The classification was based on the theory. Not all concepts presented in our framework was analyzed in the papers (for instance, Multitrait-Multimethod Matrix), since our space in this article is limited. The sample of this research comprises articles published in RAC -Revista de Administração Contemporânea, RAE -Revista de Administração de Empresas, RAUSP -Revista de Administração, EMA -Encontro de Marketing, and EnANPAD -Encontro Anual da Associação de Pós-Graduação em Administração.
The choice is justified because RAC, RAE, and RAUSP are the top-three Brazilian business journals, EnANPAD is the major and oldest Brazilian meeting in Business, and EMA is the only Brazilian professional meeting in marketing. 4 All have the double-blind-review process and reflect the highest quality in the Brazilian marketing academic production.

DATA ANALYSIS
The second important analysis in Table 1 is that the test-retest reliability was also not frequently used. Only two (18%) papers (SANTOS; MUNIZ, 2004;PAIVA, 2004) used test-retest measures. Test-retest could identify problems in the instrument, since it compares the scores of the first measures with the second measures. Ideally, both results should be similar.
In addition, the CFA procedure was used by 6 papers (55%), according to Table 1. It appears that Brazilian researchers are following the suggestion of Gerbing and Anderson (1988) and using an addition technique to achieve better validity. CFA can be carried out via softwares like LISREL, EQS, and AMOS. Cronbach Alphas for the scales in Table 1 ranged from α = 0,61 to α =0,97, with an average of μ = 0,793. It appears a good value for new instruments. However, the results can be view as caution, since much work is needed to improve the use of CFA. Table 2, "Brazilian psychometrically tested Scales," presents the scales that were psychometrically tested by Brazilian researchers. Some of the instruments are very well-known by the international scientific marketing body. Some results were unexpected. For example, Sampaio and Perin (2001), found that the multidimensionality of MARKOR scale should be changed to unidimensional. Siqueira (2004) suggested that SERVPERF scale needs to be reviewed and new variables need to be included in the model. Souza and Luce (2003) did not confirm the 4 factors structure suggested by TRI-Index Instrument, indicating a better fit with a 5 factors structure.
Unfortunately, reviewing the tested scales, we also find that not all scales used nomological validity (Table 2). It appears that the Brazilian researchers do not pay much attention to that important criterion in scale validation. Therefore, we strongly suggested that future research test that item.
Other important suggestion, according to the results in Table 2, is that in testing the psychometric proprieties of scales, Brazilian researchers did not assess the Alternative Forms of reliability available. Assessing that could generate additional results and/or could improve the research outcome.
Content validity was used in some cases (40%). Content validity could be verified more intensively with experts, professionals, and academics (MBA or PhD students). We encourage more of this work among professionals and colleagues, since it is not a difficult task and could produce interesting results.
In addition, some results are preoccupying: CFA was used in 53% of the cases, Content validity was used just in 40%, and instrument pre-test in a small sample was used in 60% of the cases.
Summarizing, Brazilian researchers should move beyond testing internal consistency and convergent, and discriminant validity only. Scales are built to measure concepts that will help to improve theory and applicability. Therefore, the tools available in theory must be used for such propose, to improve scientific knowledge.   The authors found that the three dimensions of MARKOR should be one. It means that the market orientation construct (multidimensional) should be one, instead of three. The authors did not support the discriminant validity.

5
Since this article has size limitations, it will not be presented (in the end of the paper) the complete references of those authors. (continue)

DISCUSSION
The main conclusion from this empirical research is that Brazilian researchers are not using the tools available by the theory for assessing marketing scales. Based on this analysis, if authors had used the tools existing some results could be improved or became more evident. It is time to stop using internal consistence and sometimes convergent and discriminant tools. It is necessary to move beyond using tools more known (i.e. EFA and Alpha Cronbach) and complement them with Nomological and CFA procedures in order to achieve a better conclusion.

PAPER CONTRIBUTIONS
If researchers apply these scales without a sound theory, without understanding the importance of situations, without knowing the limits of measures of individual differences, and without careful definition of goals, they may become many dissatisfied, disappointed and discover that the mindless use of scales cannot fix consumer research, just as hammers cannot fix micro-electronic devices (KAHLE, 1994, p. 429).
First, it is important to note that there are many manners of assessing a scale. Figure 1 is an effort to present these manners (the majority of them) in an aggregate form, although it is recognized that the literature has been discussing each one separately and in greater depth. Therefore, this paper tries to contribute to marketing theory by proposing a framework in which some researchers can use as a guide for consulting, testing, or developing their scales.
Second, Table 1 is a result of the contribution of Brazilian researchers to international marketing knowledge, since it proposes the Brazilian scales. This effort should be noted by the international academy, because a few Portugueselanguage countries could use those scales in their investigations, namely Portugal, Cape Verde, Guinea-Bissau, Mozambique etc.
In addition, as Table 1 presents the constructs those new scales were developed, and now these variables need to be put a proof in terms of refinement. We need to obtain the real validity of Brazilian scales in cross-country studies. It could also approximate Brazilian researchers to their international peers, allowing future instrument cross validation researches.
Third, according to Table 2, international researchers, mainly USA researchers, can benefit from the results of some scales that were tested in Brazil (e.g. TRI, SERVQUAL and SERVPERF). Therefore, international researchers could review their scales in terms of strengths and weakness and developed more valid instruments.
Fourth, this research should be helpful in reducing the time it takes for Brazilian researchers as well as market professionals to find scales for their survey, since it allocates all Brazilian marketing scales in one article. Therefore, this paper might be used as a guide for future consultation in order to elaborate new questionnaires.
Fifth, this paper may identify areas where new measures are needed. For example, researchers can review the tables and identify fields that are presently lacking measures. Thus, it encourages further development of new valid measures.
Sixth, according to the theory (STEWARD, 1993), the availability of those scales may reduce the frequent use of ad hoc scales in marketing research and increase attention to the quality of measurement in empirical studies in consumer and marketing research.
Seventh, this paper also contributes to other meta-studies in Brazil. In fact, other papers presented different perspective of Brazilian marketing evolution, such as Vieira (1998,1999,2000), Perin et al. (2000), Botelho and Macera (2001) and Brei and Liberaldi (2004). Therefore, this article also helps in strengthening the Brazilian marketing field.

FUTURE RESEARCH
Since the marketing research field (specially marketing scales and measurement) needs to continue its development, we present some paths for future research. First, future tests could be performed in the existing Brazilian scales. In other words, other researchers could test psychometrically the Brazilian scales in an international context, verifying their proprieties, dimensions, and theoretical support. Second, future research could be pursued in other countries trying to present the validated and created scales, in a similar manner to how it was presented in this work. For instance, scholars might not know or might have little access to some useful instruments that were validated and created in Mexico or Germany. Without such survey, some instruments might be hidden to international tests. 7 To conclude, other researches could improve the framework suggested in this initial proposal. As consequence, marketing theory could have a more valid and reliable guide for testing scales.