A Critical Analysis of Measurement Models of Export Performance

Poor conceptualization of the export performance construct may undermine theory development efforts and may be one of the reasons behind the often conflicting findings in empirical research on the export performance phenomenon. This article reviews the conceptual and empirical literature and proposes a new analytical scheme that may serve as a standard for judging content validity and a guiding yardstick for drawing operational representations of the construct. A critical assessment of some of the most frequently cited measurement frameworks, followed by an analysis of recent (1999-2004) empirical research, leaves no doubt that there are flaws in the conceptualization and operationalization of the performance construct that ought to be addressed. A new measurement model is advanced along with some guidelines which are suggested for its future empirical validation. The new measurement framework allegedly improves on other past efforts in terms of breadth of coverage of the construct’s domain (content validity). It also offers a measurement perspective (with the simultaneous use of both formative and reflective approaches) that appears to reflect better the nature of the construct.


INTRODUCTION
Studies on export performance have reached inconsistent and even contradictory findings. Such conflicting results may be due, among other possible reasons, to differences in conceptualization, operationalization and measurement of the export performance construct. Bagozzi and Phillips (1982) have cautioned against the chances of incorrectly rejecting (or also failing to reject) a hypothesis due to "lack of correspondence between the measurements and the concepts that the measurements are intended to represent" (p. 459). Peter (1981) has put it this way: "theories cannot develop unless there is a high degree of correspondence between abstract constructs and the procedures used to operationalized them" (p. 133).
Although some researchers have advanced useful conceptual and operational frameworks, they all suffer from content limitations (in terms of collective exhaustiveness of the construct's domain) as well as methodological shortcomings (e.g., the modeled relationship between indicators and construct). Therefore, the purpose of this paper is three-fold: (i) to advance a new analytical framework that could serve as a general guideline for the conceptualization and measurement of the export performance construct; (ii) to critically assess some of the most frequently cited export performance measurement frameworks and also recent (1999)(2000)(2001)(2002)(2003)(2004) empirical works in order to determine how the export performance construct has been conceptualized and operationalized in the empirical literature; and (iii) to propose a new measurement model (as yet untested) and corresponding suggestive validation guidelines that would supposedly overcome some of the flaws identified in previously employed models.

PROPOSITION OF A GENERIC ANALYTICAL FRAMEWORK
The most prominent journals on International Business (according to Dubois & Reeb, 2000) were consulted and 12 articles -which included conceptual works, empirical studies, meta-analyses, and consolidation efforts -were identified (see Table 1), which seem to represent the best efforts that have been employed to date in order to conceptually characterize the multifaceted nature of the export performance phenomenon. Such articles have also somehow served as general references for several empirical studies in the field. multidimensional nature of the export performance phenomenon has been acknowledged along the years. This evolution notwithstanding, there remain some flaws in the analytical frameworks that have been proposed: some of them are incomplete because they do not include some key characteristics of the export performance phenomenon; some tap aspects that conceptually lie outside the export performance domain; and all of them fail to fully conform to criteria of internal homogeneity and collective exhaustiveness (details can be found in Carneiro, Hemais, Da Rocha, & Da Silva, 2005). Therefore, a new generic analytical framework for the characterization of the export performance construct -that builds heavily upon, and improves on, Matthyssens and Pauwels' (1996) and Katsikeas et al.'s (2000) -is advanced (Figure 1). This analytical framework includes two major classification dimensions: conceptual aspects (definition and characterization of the phenomenon) and methodological decisions (data collection and operational representation of construct). Conceptual aspects include: classes of measures, frame of reference and temporal orientation.

Classes of measures.
Economic (e.g., profitability, sales), market (e.g., market share, reputation, customer satisfaction), behavioral / situational (e.g., attitudes towards exports, exporter vs. non-exporter dichotomy), strategic (which involve attainment of broader, usually longer-term, objectives, such as developing competencies, retaliating a competitor, entering business networks), overall evaluation (e.g., perceived success, satisfaction with export activities, confirmation of expectations), and others (which might include internal business processes, innovation and learning, social or environmental measures). Although behavioral / situational measures should not be considered strict measures of performance, but rather an indication of a pattern of behavior or of a state of affairs, such a class of measure will be kept in the newly proposed framework because it has frequently been used in the past. In this way, it will be possible to compare the evolution of the use of different classes of measures.
Frame of reference. Absolute (reporting the value itself); or relative, that is, "good" or "bad" depending on the value of some point of reference which could be: main competitors' average, some benchmark, domestic operations, other international ventures of the firm, or pre-set goals.
Temporal orientation. Static (measured at a given point in time) or dynamic (indicating change between two periods of time). Both static and dynamic measures can cover either a past or a future time.
Among the conceptual aspects, some scholars (e.g., Barney, 1996;Chakravarthy, 1986;Fiegenbaum, Hart, & Schendel, 1996) would include stakeholders' viewpoint (e.g., stockholders, customers, employees, local community etc.). However, since stakeholders' viewpoint and classes of measures seem to be strongly related (e.g., financial ↔ stockholders, market ↔ clients), and there has been some controversy on which types of stakeholders ought to be considered, only classes of measures has been included in the present model.
As for the methodological dimensions, they include: unit of analysis, mode of assessment and indicators structure.
Unit of analysis. Some (part of the) firm-market combination: corporate (all firm's markets), SBU (or product / product line in all firm's markets), all firm export ventures (in all export markets), specific export venture (specific product-market combination).
Mode of assessment. Data may be (considered) objective (supposed to be the same no matter what the specific source or who reports it) -collected either from secondary sources or primary (self-reported) sources -or subjective (dependent upon the personal opinion or perception of the respondent) -collected either from primary sources (be it self-evaluation or evaluation by competitors or by external experts) or from secondary sources (case material).
Indicators structure. The totality of the performance indicators collected can be arranged in different combinations, be it for interpretative purposes or for use in statistical procedures. A researcher may use manifest-like (i.e., directly observed) variables, either just one single indicator (an approach that has several drawbacks) or multiple independent indicators. Or else, indicators can be combined to form composite scales (latent variables), which could be reflective in nature -whereby the (observed) indicators of performance are considered or assumed to be effects or manifestations of a (latent) performance factor -or formative -whereby the (observed) indicators are assumed to 'cause' or determine performance (see Diamantopoulos, 1999) for details). A researcher can use either one single scale, which would incorporate all performance indicators, or multiple scales, which would each represent a combination of a group of indicators. Of course, the researcher can also use simultaneously composite scales together with 'independent' indicators.

CRITICAL ANALYSIS OF THE EMPIRICAL LITERATURE UNDER THE LIGHT OF THE NEW FRAMEWORK
In this section, a critical analysis will be conducted of 'classical' (i.e., frequently cited) measurement models and of recent (1999)(2000)(2001)(2002)(2003)(2004) empirical studies of export performance.

Critical Analysis of 'Classical' Empirical Works
Some sound export performance measurement frameworks have been proposed in the literature, albeit not without their flaws. Seven very frequently cited frameworks (Cavusgil & Zou, 1994;Cooper & Kleinschmidt, 1985;Louter, Ouwerkerk, & Bakker, 1991;Shoham, 1998Shoham, , 1999Styles, 1998;Zou, Taylor, & Osland, 1998), which may be regarded as representative of the best efforts in the area, and one additional work (Lages & Lages, 2004, a recent measurement scheme that stands out in terms of content (what is measured) and form (how it is measured)) are critically analyzed in order to show in what aspects they do not conform to the generic conceptual and methodological framework advanced in the article. Cavusgil and Zou (1994) identified one performance factor composed of four indicators: the extent to which strategic goals are achieved; perceived success of the venture; average sales growth over the first five years; and average profitability over the first five years. Diamantopoulos (1999) criticized Cavusgil and Zou for using (albeit implicitly) reflective indicators when in fact "their domain of content is not homogeneous" (p. 451), i.e., their indicators, by their very dissimilar nature, cannot be considered interchangeable. This argument can be reinforced by the fact that their 'strategic goals' high item-factor correlations (which Cavusgil & Zou took as supporting evidence of their measurement model) may in fact be suggestive of the idea that a formative structure should at least have been tested. Moreover, the 'perceived success', which is an overall measure, should better be used as a 'criterion' for the validation of the responses to the other indicators. Styles (1998) refined Cavusgil and Zou's (1994) model, not only by using more fine-grained scoring for the indicators, but also by including the evaluation by competitors (although from the perception of respondents in each focal firm) as a way to assess convergent validity. Styles cross-validated the (refined) framework in two countries: Australia and United Kingdom. Evidence was provided of factorial similarity (same number of factors, and items were associated (loaded high) on to the same factors), factorial equivalence (same factor loadings, although one the items (strategic objectives) had considerably different (0.14 and 0.38) loadings across the two samples), but (full) measurement equivalence was not obtained, so caution should be exercised when comparing results obtained in the two countries. Styles' proposition may be criticized on the grounds that only small and medium-sized companies were addressed, which casts doubt on the generalizability of the results. Furthermore, they stuck to a reflective measurement perspective, despite acknowledging this limitation ("the assumption of export performance being a reflective scale should be reexamined conceptually and empirically", p. 28). Styles also ran an exploratory factor analysis over the indicators, but disaggregated the attainment of strategic objectives indicator into its seven original components. The resulting indicators yielded four factors (contrasting with Cavusgil & Zou's one factor solution) that were interpreted as: economic performance, improvement of competitive position, future expansion and passive exporting. This reinforces the idea that strategic performance may not be highly associated with economic or sales performance, at least at the export venture level. The fact that the two perceptual (overall) success indicators loaded together with the economic and sales indicators suggests that they may not be independent.  proposed a scale (the EXPERF scale) composed of three dimensions: financial export performance (export profits, export sales and export sales growth), strategic export performance (contribution of the export venture to firm's competitiveness, strategic position and market share), and satisfaction with export performance (perceived success of the venture, satisfaction with the venture and degree to which the venture is meeting expectations). They reported good model fit and crossnational equivalence between the U.S. and Japan (although the marginally significant chi-square (p = 0.0368) of the model tested with the Japanese sample may cast some doubt over its adequacy to the data). Diamantopoulos (1999) has already conducted a very good critique of Zou et al.'s measurement model on the grounds that a formative approach (instead of the reflective perspective implicitly used) would seem to make more sense. Their own argument about the possible differing effects of explanatory factors on each dimension of the scale ("relative importance of determinants of export performance with respect to each dimension of the EXPERF scale"; "understanding of how various factors contribute to each of the three dimensions of exporting success", p. 53-54) seems to suggest that the indicators may not move together, so a formative structure would seem more appropriate. Furthermore, their overall success dimension would seem to serve better as a confirmation of the other dimensions, since "satisfaction-based measures provide richer assessments of each sub-dimension, rather than additional, independent sub-dimensions" (Shoham, 1998, p. 62). Zou et al.'s framework is not collectively exhaustive of the domain of the export performance construct, since some relevant aspects of the phenomenon were not tapped (e.g., comparison against competitors (although this aspect might have been implicitly considered in the mental algorithm used by respondents) and expected future performance. Shoham (1998) factor analyzed fourteen items and uncovered three factors (one of the itemsmarket share -showed low correlations with all three factors and was subsequently dropped from further analysis): sales (satisfaction with export intensity, export intensity, satisfaction with export sales and export sales), profits (satisfaction with profit margin, export profit margin), and change (satisfaction with five-year change in export intensity, satisfaction with five-year change in sales, fiveyear change in export intensity, five-year change in sales, five-year change in market share (note that market share was dropped, but change in market share was kept), five-year change in profit margin, satisfaction with five-year change in profit margin). Each sub-scale (composed only by the items that loaded high on the respective factor, constraining items to load on to a single factor each) showed high reliability (α ≥ 0.76), within-scale inter-item correlations exceeded respectively 0.52, 0.61 and 0.37 and item-to-scale correlations exceeded 0.66 for the sales scale and 0.61 for the change scale (item-toscale correlations for the profitability scale were not reported). Shoham's (1998) framework can be criticized on the grounds that it considered an all exports unit of analysis (instead of a single export venture), which should be recommended only when the determinant factors in a larger explanatory model include firm-wide variables, which was not his case. Although some authors (e.g., Madsen, 1987; have argued for change to be a distinct (from economic and market) dimension, from a substantive standpoint it seems that change is an inherent and complementary aspect in the assessment of both economic and market performance and not an independent dimension. Shoham's (1998) results provide some evidence for this argument since their operationalized sales and profits sub-dimensions are well correlated with the change sub-dimension (r = 0.56 for both). Lages and Lages (2004) proposed and validated (in Portugal and the U.K.) three short-term export performance sub-scales (it is not clear whether they further aggregated the sub-scales), baptized as STEP: satisfaction with short-term performance improvement (composed of export sales volume, export profitability, market share in the importing market, and overall export performance; however, judging from the question, the authors are actually measuring change in satisfaction: how satisfied are you with the results of your export venture from year 1 to year 2; much less satisfied ↔ much more satisfied), short-term export intensity improvement (actually, change; composed of percentage of exporting venture to total sales volume, percentage of exporting venture to total profitability), and expected short-term performance improvement over a one-year period (composed of export sales volume of the export venture, export sales profitability of the export venture, achievement of objectives of the export venture, satisfaction with the exporting venture; however, judging from the question, the authors are actually measuring anticipated change in performance: worsen significantly ↔ improve significantly). Confirmatory factor analysis confirmed a good fit to data (average coefficient alpha = 0.93 and 0.87 for Portugal and the U.K. respectively; average composite reliability = 0.93 and 0.87; average variance extracted = 0.82 and 0.70) and also good discriminant validity. Cross-national validation was partially supported since the model showed factorial similarity (running the model separately for the two samples and constraining indicators to load on to their pre-specified factors showed a significant chi-square, which would suggest that the model did not fit the data well, but, since the chi-square statistic is sensitive to sample size, other fit indexes were also used and they all showed a good fit. The authors thought it reasonable to conclude that in both countries the same factors and the same indicators for each factor would hold), factorial equivalence (same loadings; chi-square difference was non-significant), but not (full) metric equivalence (error variances were not found to be the same). Lages and Lages' model falls short of representing the export performance construct well because only a reflective perspective was (implicitly) assumed, and no relative frame of reference and no static indicator were used. Besides, they included a situational indicator (export intensity). Shoham's (1999) measurement model considered two dimensions: export performance (satisfaction with the ratio of exports to total sales, satisfaction with export sales and satisfaction with the export sales profitability ratio), and five-year change in export performance (change in ratio of exports total sales, satisfaction with change in export sales and satisfaction with change in export sales profitability ratio). Composite reliability was good (ρ c = 0.77 and 0.83, respectively), and the author reported that there was evidence of convergent and discriminant validity. Although well developed, Shoham's (1999) framework is flawed because it does not include any market indicator, any relative fame of reference, nor any assessment of anticipated (expected) future performance. In addition, a formative structure was not even conjectured.
Other models analyzed (Cooper & Kleinschmidt, 1985;Louter et al., 1991) are rather simplistic in terms of domain sampling. Besides, they did not aggregate indicators into scales (latent constructs), but kept them independent as pure manifest-like variables.
represent the export performance phenomenon. Not only do they not provide an exhaustive account of the construct's content, but there are also deficiencies in terms of assessment of construct validity. Moreover, a competing models approach that would comparatively evaluate the original measurement model against other theoretically possible alternatives was not used by any of the works revised here.
Critical Analysis of Recent (1999Recent ( -2004 Empirical Literature An electronic and manual search, covering the period of 1999-2004, of 24 leading journals in International Business and also Marketing-, Management-and Business-related areas, was conducted in order to identify empirical works that had dealt with the conceptualization and operationalization of export performance (see Table 2). In Table 3 the different views of export performance used in 37 recent (1999)(2000)(2001)(2002)(2003)(2004) empirical studies, as well as in those seven classics revised before, are analyzed along each of the three conceptual aspects and the three types of methodological decisions presented in Figure 1. To be included in the analysis, a study had to be empirical and have export performance (whatever the format) as a dependent variable. In some cases, the classification of the performance indicators along some dimension could not be clearly identified and had to be inferred by the coder (this is shown in shadowed boxes in Table 3). Though not exhaustive, the set of studies revised in Table 3 gives a fair account of export performance definition and operationalization in recent research and may be seen as an indication of trends in the field.
It is revealing to note that none of the 37 empirical articles reviewed seems to fully conform to the analytical framework proposed in this paper. They typically cover only a few angles of the construct and do not recognize (or implicitly assume no) formative perspective.
The consolidated results show that the majority (59%) of the studies used more than one class of measures and that economic measures seem to dominate empirical research. Almost half of the studies also used some overall measure, while strategic measures were used by almost one fourth of the studies. Only one study employed behavioral / situational measures and none employed other types of measures.
Twenty-two (59%) of the studies used only an absolute reference, seven (19%) used only a relative reference, while eight (22%) used both types of reference. This apparent preference for an absolute reference seems to be in conflict with the fact that, from a managerial standpoint, performance ought to be judged also against what competitors have achieved (have we fared better than them?) or against other alternatives of resource allocation (do our export activities seem to be a good application of our firm's scarce resources?). Among the studies that used a relative frame of reference, there was great variety as to the points of comparison selected.
In terms of temporal orientation, 54% of the studies employed only a static fashion, 11% employed only a dynamic fashion, while 35% used both a static and a dynamic fashion. The great majority of the studies (89%) looked only into the past. There was great variety in the time span used. Most researchers did not explicitly state what time span was considered.
Almost half of the studies did not explicitly say what unit of analysis was to be considered. There seems to be a clear preference (57%) for an all export ventures unit of analysis, but a specific export venture and a firm (corporate) scope were also explicitly or implicitly used (by 27% and 16% of the studies, respectively). SBU was not used.
There can be seen a fairly even distribution between only objective, only subjective (perceptual) and both modes of assessment. In the case of objective data, self-reports by managers were used more often than secondary data. As for subjective data, only the firm's own managers' opinions were collected.
Most studies employed multiple (more than one) independent indicators and two thirds of the studies used aggregated scale(s) (thereby using a latent construct representation for the phenomenon)however, a reflective measurement perspective was always assumed.
It is surprising that, despite the fact that some reasonably robust conceptual and measurement frameworks have already been proposed in the literature, most recent empirical works have chosen to employ rather simplistic conceptualization schemes and methodological procedures. A shadowed box indicates that the specific dimension used was not explicitly acknowledged by the original authors and had to be inferred by the coder. Frame of Reference Relative: a = industry or main competitors average; b = benchmark; c = domestic operations; d = other international ventures in the firm; e = pre-set goals or expectations; f = not explicitly stated by the original author(s).

Frame of Reference
Temporal Orientation Static or Dynamic: h = recent past; i = future expectations. Time span: 1= one year; 2 = two or three years; 3 = four years of more; 4 = not explicitly stated by the original author(s).
Mode of Assessment Objective: p = from secondary sources; q = self-reported. Subjective: r = self-evaluation by managers; s = evaluation by competitors; t = evaluation by external experts; u = case material.

Type of indicators
Reflective or Formative: w = one single composite scale; x = multiple composite scales.
Note: Due to space limitations, the bibliographic references of the 37 recent empirical studies is not presented here, but can be obtained from the authors upon request.

PROPOSITION OF A NEW MEASUREMENT MODEL
Although it may not always be possible to measure the totality of the domain of interest, the items selected should at least be deemed to represent the concept under examination reasonably well (Hinkin, 1998). Furthermore, the structure of relationships ('causes' and effects) between the construct and its indicators should adequately represent their nature.

Preliminary Considerations
Although success at the export venture level should not be automatically assumed to be beneficial to the firm as a whole because exports use up corporate resources that might otherwise be employed elsewhere in the firm (Thach & Axinn, 1994), choosing one specific export venture as the unit of analysis would enable the researcher to better isolate the impact of influencing factors on export performance, instead of averaging out successes and failures across the firm's (exporting or other) activities (Matthyssens & Pauwels, 1996).
Economic measures are no doubt relevant. Besides, some market and also strategic measures might be interesting in order to account for some broader, not just short-term oriented, aspects of the export activity. However, since strategic objectives may vary significantly among different firms, it would be difficult to devise common objectives that would enable comparison among companies. So, one could instead collect data on some overall (aggregated) measure that would, somehow, reflect strategic (as well as other aspects) of the export performance phenomenon. For the sake of parsimony, both behavioral / situational aspects (which are not strict measures of performance) and other measures (such as operational, social or environmental measures) can be left out of the (reduced) working framework. Should an export venture have some strategic objective that may be inconsistent with economic return at least in the short-to medium-term, then it ought to be excluded from the sample. This would contribute to content validity and comparability of results.
Since it is often difficult to clearly and uniformly segregate export results from corporate results (Leonidou et al., 2002) and an attempt to elicit objective information about export performance may diminish the response rate, it is advisable to use subjective measures. Besides, a comparison against one's competitors would appear to be the most meaningful reference pattern from a managerial viewpoint. Since subjective data has been shown to be highly correlated with objective data (Dess & Robinson, 1984) and respondents may in fact provide (albeit implicitly) perceptual (i.e., subjective) and relative information even if asked about an absolute measure, and given the fact that managerial action tends to be driven by perceptions and not only by cold numbers (Matthyssens & Pauwels, 1996), the choice of subjective (self-evaluation by firm's managers), relative (to competitors), measures seems reasonable.
Static measures are good to compare among firms at a given point in time, but furnish no information as to how performance has evolved or is expected to evolve. So, it is advisable to use a set of measures that somehow taps static and dynamic (change) aspects as well as past and (expected) future performance. A three-year time frame seems to be a reasonable cut-off point for managers to accurately report past performance and predict future performance.
The use of multiple (rather than single) indicators not only provides a broader coverage of the concept but also, if combined to form a scale (i.e., a latent construct, instead of pure manifest variables), gives a more reliable account of the construct (Bollen & Lennox, 1991;Hair, Black, Babin, & Anderson, 2005). Although most empirical research assumes (usually implicitly) a reflective perspective for the indicators (whereby a latent construct, the export performance, would 'determine' the level of its indicators), it has, in contrast, been argued that performance could be conceptualized as a consequence of its indicators (Diamantopoulos, 1999), i.e., some performance indicators should modeled in a formative fashion.

A New Measurement Model
Two alternative measurement models are advanced in Figure 2 and some possible indicators are suggested in Table 4. Model A incorporates three reflective economic indicators (E 1 , E 2 and E 3 ) and three reflective market indicators (M 1 , M 2 and M 3 ) -which are manifest variables of two formative first-order sub-constructs (Economic Performance and Market Performance) of Export Performance. Export Performance itself is represented by two reflective overall indicators (O 1 and O 2 ). Model B is a MIMIC (multiple indicators multiple causes) model composed of six formative indicators (three economic and three market indicators), besides two reflective overall indicators (O 1 and O 2 ). Models A and B are essentially similar to those suggested by Diamantopoulos (1999, Figure 2a, p. 450) and Diamantopoulos and Winklhofer (2001, Figures 2 and 3, p. 272-273), but have been made operational by the addition of specific indicators.

Figure 2: Two Alternative Suggestions for a Measurement Model of Export Performance
Model A: A second-order measurement model of export performance Model B: A MIMIC measurement model of export performance Collectively, either one of the two models offers a broader coverage of the export performance domain than can be seen in any of the empirical studies reviewed here. As for the methodological issues, both models would use a single export venture as the unit of analysis, the mode of assessment would be subjective (self-reporting by responding managers) and the structure of the indicators would incorporate both reflective as well as formative scales. The use of a hybrid (formative plus reflective) perspective of measurement contributes to a more appropriate representation of the construct's nature.

Table 4: Suggestion of Possible Indicators of Export Performance Indicators
Conceptual dimensions covered E 1 satisfaction with export venture revenues in last three years economic, absolute, static (recent past) E 2 revenues growth of the focal export venture vis-à-vis revenues of other export ventures of the firm in last three years economic, relative (to other firm's export ventures), dynamic (recent past) E 3 expected export venture profitability for next three years economic, absolute, static (near future) M 1 export venture volume vis-à-vis competitors in last three years market, relative (to competitors), static (recent past) M 2 expected volume of the focal export venture vis-à-vis volume of other export ventures of the firm for next three years market, relative (to other firm's export ventures), static (near future) M 3 export venture volume growth in last three years market, absolute, dynamic (recent past) O 1 overall export venture results in last three years overall, absolute, static (recent past) O 2 expected overall export venture results for next three years overall, absolute, static (near future) Note: these indicators were extracted and adapted from a list of over one hundred export performance indicators found in several empirical works (the full list is available from the authors upon request)

Implications of the Use of a Formative Measurement Approach
The use of a formative structure to represent the dependent construct in a 'causal' model renders the model unidentified (cf. Bollen, 1989), unless such a model also includes consequences of this latent construct, or else reflective indicators are used to represent the latent construct as in the models suggested here. A modeling approach such as this also provides a means for assessing concurrent validity.
Not only for identification purposes, but also for nomological validity purposes, one could add consequences of the construct. For example, one could ask: 'If you could go back in time and know how the facts would unfold, would you have recommended that the same (or higher) amount of financial and managerial resources be invested in this export venture?'; or 'Would you recommend that your firm keep investing efforts in this export venture?'. By modeling these consequences as factors with reflective measures or directly as manifest variables of distinct constructs (the consequences, which are conceptually different from the focal construct), the measurement model is identified.

Suggestion of Validation Guidelines
The final selection of specific reflective indicators could follow Hinkin's (1998) suggestion of starting with twice as many indicators for each dimension as the researcher expects to retain afterwards, which should then be purified following Churchill's (1979) well-known procedures. As for the formative indicators, one could also start with twice as many, and then retain only those that substantive theory would judge adequate to represent all the desired dimensions of the construct. The deletion of formative indicators could be dictated by their correlations with the others, since if one indicator correlates high with others, this may indicate that it is redundant to sample efficiently the domain of the construct (Bollen & Lennox, 1991) and, as such, a candidate for deletion for the sake of parsimony. In addition, excessive collinearity among indicators makes it difficult to separate the distinct influences of the individual indicators on the latent variable (Bollen & Lennox, 1991). However, oftentimes, the research involves other variables other than export performance and it might not be advisable to inflate the questionnaire with additional questions. In these cases, the item generation phase will have to be simplified in order to minimize response bias caused by fatigue or boredom (Hinkin, 1998).
Convergent validity (the degree to which multiple attempts to measure the same concept with (maximally) dissimilar methods are in agreement) may be difficult to apply in a strict sense, since there does not seem to be a universally accepted, reliable and valid standard to gauge export performance. However, within-scale items correlations have been used in the literature as evidence of convergent validity (although, strictly speaking, this is in fact a test of unidimensionality).
A test of discriminant validity (degree to which measures of distinct concepts differ) can be conducted between sub-scales of Model A (e.g., economic performance vs. market performance, both estimated as summated scales), but it may not be possible to devise appropriate external constructs for the test. Another way to assess discriminant validity (between reflective scales) is to check whether within-scale correlations (correlations between two items in the same scale) exceed between-scale correlations (correlations between an item of a given scale and an item of a distinct scale). Discriminant validity is established when the value 1.0 is not in the confidence interval (± two standard deviations) around the correlation estimates of each pair of variables (Anderson & Gerbing, 1988).
Reliability (accuracy or precision of the measuring instrument) can be assessed by means of the composite reliability index (ρ c = λ i 2 / (λ i 2 + Σε i )), which, unlike coefficient alpha, does not assume that indicators have equal factor loadings and error variances (Styles, 1998). However, reliability in the internal consistency sense and construct validity in terms of convergent and discriminant validity are not meaningful when a formative perspective of measurement is used (Bollen & Lennox, 1991) -so, one should resist the temptation to delete items as a means of improving Cronbach's alpha (MacKenzie, 2003) or the composite reliability index. Furthermore, once demonstrated for a given sample, the reliability of a scale cannot simply be assumed to hold universally since it is a situational indicator of the effectiveness of the measurement instrument (Nunnally, 1978) and it must be demonstrated a posteriori for every sample to which it is administered.
As for concurrent validity, Smith (1999) argues that researchers can demonstrate it by regressing 'dimensions' derived from factor analysis onto overall assessments of the construct rated on a separate scale. Smith (1999) also reports that concurrent (in fact, he means predictive) validity is sometimes demonstrated by the ability of the scale to predict responses to questions or future behavior. Such a test could be conducted for different versions of the scale of the construct to see which seems to show greater concurrent or predictive validity, and also for different versions of the overall scores.
A chi-square statistic and several fit indexes can be used to test whether the measurement model fits the data well. If the overall model fit proves acceptable, this can be taken as supporting evidence for the set of indicators forming the index (Diamantopoulos & Winklhofer, 2001).
The contribution and significance of the individual formative indicators can also be assessed after the model is empirically validated. The empirical validation can also reveal whether some indicator coefficients (γ's) are not statistically significant, which might suggest their removal from the index. However, one should consider whether some dimension of the construct (related to such indicators) would no longer be represented -in this case, either a reconsideration of theory or a new sample ought to be sought. "Indicator elimination […] should not be divorced from conceptual considerations when a formative measurement model is involved" (Diamantopoulos & Winklhofer, 2001, p. 273).
External validity (the degree of generalizability of the relationships across populations, respondents, settings, situations and times; Hinkin, 1998;MacKenzie, 2003) should also be assessed, both from theoretical reasoning and empirical replication in order to determine the limits of the concepts applicability and usefulness. Generalization should also include a test of cross-national equivalence in terms of factorial similarity, factorial equivalence, and (full) metric equivalence (Singh, 1995).

ADDITIONAL REMARKS, CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH
As Venkatraman and Grant (1986) have put it: "… a strong linkage between concepts and their measures enhances theory development…" (p. 71). Although there does not seem to be a standard all-purpose framework for measuring export performance, a generic analytical framework for the characterization of the export performance phenomenon was presented here. Building upon previous models (especially Matthyssens &Pauwels, 1996, andKatsikeas et al., 2000), the newly proposed framework is argued to have improved them in two aspects: better labeling and organization of the model categories (though one can argue this is just a matter of personal taste) and explicit consideration of an additional methodological issue -structure of the indicators -which has not received proper attention in empirical literature. The proposed classification scheme is expected to serve as a general guideline for researchers to make explicit their research designs and to draw a parsimonious set of performance dimensions and indicators that better fit their specific purposes.
As the review of empirical works has shown, there seems to be a lack of continuous and complementary efforts to develop a robust measurement model and to assess the quality of measurement instruments in export performance research, with different researchers advancing their own conceptual framework (not always adequately justified) and their own operationalization of the construct. Consequently, comparability among studies is impaired.
This article advanced an operational measurement model that improves over other frameworks employed in empirical research to date, not only in terms of content validity (domain sampling), but also in terms of the relationships between the construct and its indicators. A suggestive short list of validation guidelines was also presented. Matthyssens and Pauwels (1996) argue that "when studying success determinants in export marketing, a valid and reliable measure of export performance is critical" (p. 85). Besides, researchers should bear in mind that the validation of the measurement model (of each construct) should precede the testing of substantive relationships (Venkatraman & Grant, 1986). Therefore, robust conceptualization and operationalization of the determinants of export performance should also be sought. In Shoham's (1998) words: "to make results truly comparable across studies, similar measurement modeling procedures have to be followed for the factors that are expected to influence export performance" (p. 55).
Naturally, the usual limitations in empirical literature should always be addressed, such as: single respondent (no peer assessment), within-the-firm respondent bias (no competitor or expert assessment), no triangulation of data (e.g., only subjective) and single data collecting method (e.g., survey), as well as all the problems related to retrospective reports.
Another interesting topic for future research was proposed by Matthyssens and Pauwels (1996) about the false dichotomy between success and failure. In their own words: "Are 'success' and 'failure' the extremes of a unidimensional performance scale? Are indicators built to measure 'success' able to measure 'failure'? Do 'success' and 'failure' have the same dimensions?" (p. 110). So, given the fact that most research is conducted only on firms (or export ventures) that are still alive, survival bias should be explicitly addressed.