Bifactor Invariance Analysis of Student Conceptions of Assessment Inventory

Student conceptions of the purposes of assessment are an important aspect of self-regulated learning. This study advances our understanding of the Student Conceptions of Assessment Inventory (SCoA) by examining the generalizability of the factorial structure of the SCoA using bifactor analysis and conducting cross-cultural invariance testing between Brazil and New Zealand. Eight different models were specified and evaluated, with the best model being adopted for invariance testing. This research adds to our understanding of the cross-cultural properties of the SCoA because the introduction of the bifactor model resulted in metric equivalence between countries, which had previously had only partial metric equivalence. Future studies should attempt to create more items around several SCoA constructs. o melhor modelo adotado para o teste de invariância. Esta pesquisa acrescenta à nossa compreensão das propriedades transculturais do CAE porque a introdução do modelo bifatorial resultou em equivalência métrica entre países, que anteriormente tinham apenas equivalência métrica parcial. Estudos futuros devem tentar criar mais itens em torno de vários construtos do CAE. autorre-gulado. Inventario de Concepciones de Evaluación de Estudiantes (CEE), mediante la investigación de la generalización de la estructura factorial del CEE utilizando análisis bifactorial y realizando tests de invariancia transcultural entre Brasil y Nueva Zelanda. Se especificaron y evaluaron modelos diferentes,con el mejor modelo adoptado para el test de invariancia. Esta investigación aumenta nuestra comprensión de las propiedades transculturales del CEE, ya que la introducción del modelo bifactorial resultó con equivalencia métrica entre países, que anteriormente tenían sólo equivalencia métrica parcial. En el futuro, otros estudios posiblemente tratarán de crear más ítems sobre varios constructos del CEE.


Introduction
Self-regulation theory (Zimmerman, 2008) suggests that greater learning outcomes arise when students (a) activate prior to commencing learning a variety of self-motivation beliefs, (b) control and observe their own performance, and (c) reflect upon and evaluate the self, causes, and outcomes. The self-evaluative phase then iteratively contributes to the activation of various self-motivation beliefs. Hence, self-regulation of learning requires understanding the purposes and consequences of evaluation, not just controlling learning processes. Self-regulation theory also indicates that certain kinds of cognitions, feelings, and actions lead to increased learning outcomes (Boekaerts & Cascallar, 2006). For example, taking responsibility for one's actions (Zimmerman, 2008), having positive affect in learning (Pekrun, Goetz, Titz, & Perry, 2002), and making use of feedback (Hattie & Timperley, 2007) are adaptive self-regulating responses. In contrast, blaming external, uncontrollable factors (Weiner, 2000), prioritising emotional well-being (Boekaerts & Corno, 2005), and ignoring learning-related evaluations are examples of maladaptive, non-regulating responses that lead to decreased academic achievement.
Self-regulated learning models incorporate reflection upon performance as an important facet; for higher education students the majority of the performance information is derived from formal assessment events. Assessment processes influence students' behaviors, learning, studying, and achievement (Entwistle, 1991;Peterson & Irving, 2008;Struyven, Dochy, & Janssens, 2005). Hence, student opinions about the nature and purpose of assessment are likely to influence student learning-related behaviours and educational achievement. Thus, an important aspect of self-regulation, often overlooked in learning research, is student conceptions of assessment.

Student Conceptions of Assessment
Literature reviews have demonstrated that students are aware of a number of purposes for assessment (Brown, 2011;Brown & Hirschfeld, 2008;Harris, Harnett, & Brown, 2009;Weekers, Brown, & Veldkamp, 2009). These include awareness that assessment can (a) help improve performance, (b) be negative and ignored, (c) trigger emotional responses, (d) improve classroom climate, (e) evaluate school quality, (f) predict intelligence and future career success, (g) hold students accountable for learning. Further, it would seem as students mature, and especially upon entering secondary schooling with its certification assessment, they tend to become more negative about the function of assessment (Harris, Harnett, & Brown, 2009).
In accordance with self-regulation frameworks, statistically significant increases in academic performance among high school students in New Zealand have been reported for various adaptive beliefs (Brown & Hirschfeld, 2007, 2008. Increased achievement has been reported when students endorse assessment makes students accountable, assessment is good for me, assessment is valid; assessment makes students accountable; and assessment improves student learning and teacher instruction. In contrast, negative relations were found to performance on standardised tests of reading comprehension and mathematics for the factors assessment was bad, unfair, or irrelevant/ignored. Similarly, factors identifying external attributions (e.g., assessment indicates school quality or predicts student future) had negative relations to academic performance. Furthermore, factors focused on well-being (e.g., assessment is fun or enjoyable, assessment improves class environment) had negative regressions towards achievement. The proportion of variance in academic performance explained by the conceptions of assessment factors was not trivial, with impact on academic achievement measures reaching, on average, moderate effect sizes (Brown, 2011).

The Student Conceptions of Assessment Inventory
The Student Conceptions of Assessment inventory was developed with New Zealand secondary school students. The SCoA-VI summarises student conceptions of assessment as four inter-correlated constructs (i.e., "Assessment Improves Learning and Teaching [Improvement]", "Assessment Relates to External Factors [External]", "Assessment has Affective Benefit [Affect]", and "Assessment is Irrelevant [Irrelevance]"). Note that details of the SCoA including dictionary of items and New Zealand data files are available at figshare.com (Brown, 2017).
The Improvement conception reflects an adaptive, self-regulating response consisting of two first-order factors (i.e., five items related to students using assessment to evaluate, plan, and improve their learning activities and six items related to teachers interpreting students' assessed performances so as to improve their instruction). The External conception likewise has two first-order factors (i.e., four items in which assessments measure students' future and intelligence and two items in which assessment measures the quality of schooling). These perceptions relate to a lack of personal autonomy or control or external locus of control attributions (i.e., it is about the school and my future) which are maladaptive, non-regulating beliefs. The Affect conception also has two first-order factors (i.e., two items in which assessment is a personally enjoyable experience and six items in which assessment benefits the class environment). These aspects of assessment relate to a sense of 'well-being' and are notionally maladaptive (Boekaerts & Corno, 2005). The Irrelevance conception, consisting of three items on assessment being ignored and a first-order factor in which five items capture students' tendency see assessment as bad or unfair, expresses a maladaptive response since rejecting the validity of assessment lessens a growth-oriented response to being evaluated.
Two validity studies with university students showed that the SCoA factors related to motivational constructs in a manner consistent with self-regulation theory. Hirschfeld and von Brachel (2008) used a German translation of the SCoA-II (Brown & Hirschfeld, 2008) with undergraduate psychology students to examine their learning behaviours for assessment. In a good fitting model, they found that three of the SCoA factors predicted individualised learning strategies (e.g., mind mapping or summary writing). The paths from student and university accountability predicted increased self-reported usage of these strategies, while the enjoyment affective response acted as a negative predictor of individualised learning strategies. This suggests that agreement with the evaluative purpose of assessment acts adaptively to increase personal responsibility in learning behaviour, while emphasis on the affective domain appears inimical to the growth-related pathway.
The full SCoA version 6 was used with students at one American university which annually administers a low-stakes system evaluation test (Wise & Cotten, 2009). Meaningful relations between SCoA and two measures of motivation (i.e., time taken to respond to a computer administered test-response time effort and attendance at the low-stakes testing day) were found. Less guessing (i.e., longer response times) was associated with greater belief that assessment leads to improvement, while more guessing was predicted by lower Affective benefit and greater Irrelevance of assessment. Attendance on the day of the low-stakes test was considerably higher for those who endorsed improvement and affect and rejected irrelevance.
The Students' Conceptions of Assessment version 6 (SCoA-VI) uses 33 self-report items in which participants rate their level of agreement using an ordinal agreement, six-point, positively-packed rating scale (Lam & Klockars, 1982), with two negative options (strongly disagree, disagree) and four positive options (slightly, moderately, mostly, and strongly agree).

Previous cross-cultural studies of the SCoA
A cross-cultural study with higher education students in Hong Kong, China, New Zealand, and Brazil was reported recently (Brown, 2013). The SCoA inventory was broken into two halves to reduce fatigue among Hong Kong and China university students who were also given new experimental items. Additionally, a previous study with the SCoA in Brazil had eliminated one item related to assessment telling parents about student performance from the Student Future factor (Matos, Cirino, Brown, & Leite, 2013). This meant that comparisons between Brazil and New Zealand university student responses was done in two parts: Part A consisted of two items for assessment predicts student future and the complete assessment is Irrelevant factor of eight items, while Part B had two items for School Quality, 11 items for Improvement in two factors, and eight items for Social and Affective Benefit in two factors. Four group invariance testing, using maximum likelihood (ML) estimation, found that Part A only had configural invariance, while Part B was completely invariant. Pair-wise comparison among the four samples showed in Part A that the Brazil group differed from all others, suggesting systematic differences may exist in Brazil. An alternative explanation could lie in the use of maximum likelihood (ML) estimation which is intended for continuous variables.
In a two-country comparison (Brazil vs. New Zealand) of the full SCoA inventory (Matos & Brown, 2015) a different measurement approach was used. The weighted least squares mean and variance estimation (WLSMV) procedure was used to account for the ordinal nature of the response format and all higher order factors were removed to test an eight-factor inter-correlated model. The fit of the revised model for each sample was acceptable, but the two-group invariance test indicated that the model lacked configural and metric invariance. About half of the items had large differences in item regression weights, as did half of the factor inter-correlations. Large mean score effect sizes (d>.60) were seen in favour of New Zealand students for Teacher Improvement, Class Environment, and Student Future, while Bad was in favour of Brazil students.
Hence, while the SCoA seems to have some promising characteristics in terms of cross-cultural invariance, perhaps related to the similarity of assessment cultures in universities world-wide, there are simultaneously differences related to local contexts.

Higher Education Contexts
New Zealand. Until the 11th year of schooling there are no high-stakes assessments in New Zealand. There is much assessment, including the use of standardised testing, but this is school-controlled, done largely for formative and reporting purposes, and there are no negative consequences for schools, teachers, or students as a result of poor performance. All students meeting standards are eligible for publicly funded higher education.
There are 8 public universities in New Zealand and no private universities, although there are a plethora of private trades and vocations-oriented providers of post-schooling training. University education is highly subsidised by the government, with students contributing about 10-20% of full tuition cost in fees. Entry is via completion of recognised secondary school qualifications, which for most students consists of both internally and externally-administered assessments. However, entry is open for all adults aged 20 plus provided foundation courses are passed by those not having normal secondary school qualifications. Faculties and programmes within universities may set higher entry standards; usually in the most competitive subject areas such as medicine, engineering, or commerce.
Brazil. Relative to the size of its economy, Brazil does not spend much on education. For instance, in higher education, the amount spent per student in Brazil is US$ 11.8 thousand, while the OECD average is US$ 16.1 thousand (OECD, 2018). Nonetheless, the number of students enrolled in tertiary level has increased from about 1,500,000 students in 1991 to over 7,000,000 in 2013. There are 2391 universities (301 public and 2090 private institutions), with only about 2,000,000 enrolled in public and 5.300,000 in private universities. Hence, tertiary level education in Brazil is characterised greatly by students in private institutions. However, recently the government has created several scholarship programs for students in private institutions. Additionally, during the last years, quota spaces have been set aside in public universities.
Brazil is a largely examination driven culture in which assessment is used as a student accountability mechanism. Students are evaluated at the end of the elementary, middle, and high school education stages with a standardized test. Brazil has a National System of Higher Education Assessment (SINAES), which includes assessment of student performance (National Exam of Student Performance -ENADE), institutional evaluation, and evaluation of courses.

Method
Because previous studies have demonstrated noninvariance, this study adds to our understanding of whether differences in methods of analysis might have contributed to the lack of invariance. For example, different model structures (i.e., hierarchical vs. first-order only) have resulted in different results. The lack of invariance in the original New Zealand model, other than the ecological argument that contextual differences in how assessment is implemented and consequently experienced cause non-invariance, may be resolved by using a bifactor method of analysis.
Bifactor models specify a general and domainspecific group factors. The general factor loads on all items and explains the common variance between items across different factors, and explains the item inter-correlations of all items. The group factors are additional to the general factor, and measure the shared variance between items of the same factor after partialling out the general factor. The group factors, thus, measure what is left of the different factors, after controlling for the general factor. A previous attempt at bifactor analysis of the SCoA (Weekers, Brown, & Veldkamp, 2009) used only a four-factor model (i.e., Irrelevance, Improvement, Affect, & External) and only used New Zealand high school data. That study found that the bifactor approach was plausible since a majority of the 33 items had loadings ≥.35 from the general factor.
It is also worth noting that most of the published studies with the SCoA have used the maximum likelihood estimator. It can be argued that this is a valid approach because the response scale has more than the minimum five options shown to make an ordinal response scale equivalent to continuous (Finney & DiStefano, 2006). However, the response options are ordered categories and it may prove superior to use an estimator better designed for ordinal options. The Weighted Least Squares Means and Variance (WLSMV) estimator uses an item response theory approach to determining the probability value of each score response threshold, thus placing each response option on a continuous latent scale (Muthén & Muthén, 2012). The WLSMV estimator certainly takes a more conservative approach to determining the fit of a model than maximum likelihood estimation procedures, suggesting it might be more resistant to Type I errors of accepting that the model fits the data when it does not (Li, 2016).
Thus, this study advances our understanding of the SCoA by examining all eight factors of the SCoA using bifactor analysis with the WLSMV estimator and conducting cross-cultural invariance testing also using the WLSMV estimator. We compare the invariance models through the recommendations of Cheung and Rensvold (2002) that a CFI difference between two models higher than .01 indicates that the more constrained model does not fit the data as well as the less constrained model.

Participants
Since the two samples being compared had quite different research agendas there were few common demographic characteristics available. However, both groups consisted completely of undergraduate students. Consistently, nearly twice as many females participated as males in each sample; no doubt consistent with the greater tendency for voluntary participation among women (Table 1). Only the Brazil sample met the conventional expectation of large sample size with more than 400 participants (Boomsma & Hooglund, 2001). The Brazilian sample was older on average but with much less university education experience than the New Zealand sample. It is also worth noting that in New Zealand all students were enrolled in one publicly funded institution; whereas, a mixture of public and private enrolments were seen in Brazil. The Brazilian sample reflects the contextual reality, since the majority of students in Brazil are engaged in private institutions. These various experience and institutional factors may contribute to patterns of equivalence or non-equivalence.

Instrument
In an adaptation and validation of the Students' Conceptions of Assessment (SCoA) -version VI for the Brazilian context, Matos et al. (2013) translated the inventory into Portuguese. Afterwards, three independent researchers evaluated the translation quality via back translation. Additionally, cognitive interviews were made with 12 undergraduate students from public and private universities. Only one item was eliminated from the Student Future factor (i.e., item 33-Assessment tells my parents how much I've learnt) on the basis that Brazilian tertiary students believed this item only made sense for younger students (Matos et al., 2013). Hence, for comparison purposes in this paper, item 33 from the New Zealand data has been excluded. Supplementary Appendix 1 provides the items by factor in both languages.

Analysis
The motive of the study was to find a model that retained as much of the original SCoA structure as possible while maximizing the probability that the model would fit the data from both samples equally well. The models analyzed in this study were derived initially from the structure of the original SCoA-VI model as published in two different studies (Models 1-3). A combination of these models was used to introduce the bifactor approach (Models 4-6). Then, because of relatively poor fit of the models to both data sets, improved fit was sought by introducing pairs of covarying item residuals identified by Lagrange modification indices and by exploratory factor analysis of the SCoA with the Brazilian data (Models 7-8). The eight different models specified and evaluated were:  Data analysis first evaluated the fit of each model, using the mean and variance-adjusted weighted least squares estimator (WLSMV). Then, configural (unconstrained model) and metric invariance (equivalent regression weights), also using the WLSMV estimator, between the two samples was evaluated for models that had converged with adequate fit. Strong invariance (i.e., equivalent regression weights, intercepts, factor means, thresholds, and residuals) was tested for the best fitting model. All analyses were performed using Mplus version 7.0.
The following fit indexes were used: the comparative factor index (CFI), the root mean square error of approximation (RMSEA), gamma hat, the weighted root mean residual (WRMR). A good data fit occurs when gamma hat and CFI are ≥ 0.95 and RMSEA < .06 (Hu & Bentler, 1999;Schumacker & Lomax, 2004). CFI and gamma hat values between 0.90 and 0.95 suggest an acceptable data fit, as well as RMSEA values between 0.06 and 0.09. Values outside these ranges suggest the model does not fit the data sufficiently to be accepted. Adequate fit is indicated when the weighted root-mean-square residual (WRMR) is close to 1.00, though this is an experimental fit index and little is known concerning values that indicate rejection (Yu, 2002). We use the recommendations of Cheung and Rensvold (2002) that ∆CFI ≥ 0.01 indicates that the more constrained model produces a worse data fit than the less constrained model. In this case, the lack of invariance of the model indicates the less constrained model is preferred.

Results
The eight models were evaluated for the Brazilian and New Zealand samples separately and fit values inspected (Table 2). For Models 1-3, based on the original SCoA-VI specification, fit was generally acceptable for both groups.
Only Bifactor Model #4, using the four main factors, reached convergence, but with unacceptable data fit in New Zealand sample (see CFI index in Table 2). Hence, the development of Models 7 and 8 was necessary to determine if an underlying bifactor model was present. Although Model 7 fit the Brazilian data well, it was non-converging for the New Zealand data. Model 8 (i.e., bifactor plus three factors and three covarying pairs of residuals) had good data fit in the Brazilian sample and acceptable fit to the New Zealand data.
Based on these results, two-group invariance tests were run with the WLSMV for all Models except #5 to 7. Table 3 shows configural invariance fit of the tested models. Model 2 did not converge and Model 4 had an unacceptable data fit (see CFI in Table 3). Models 1, 3, and 8 all had acceptable CFI and RMSEA values, but only Model 8 also had gamma hat > 0.90 and an RMSEA value close to the 0.06 threshold.
The four proper solution models (i.e., 1, 3, 4, & 8) were analyzed for metric equivalence (Table 4). Model 1 presented a difference in CFI >.01 between the metric equivalent model and the configural model, indicating that model 1 is not invariant in terms of regression weights. Model 3 showed ∆CFI=.01, indicating that this model is not invariant in terms of regression weights. On the other hand, model 4 showed an unacceptable data fit (see CFI in Table 4). The only metric invariant model was Model 8 (i.e., Bifactor Model #5 with three unique factors and three pairs of correlated residuals). This model was subsequently tested strong invariance (i.e., equivalent regression weights, intercepts, factor means, thresholds, and residuals) and was found to have unacceptable fit (χ² [1092] = 4671.48; χ²/df = 4.28; RMSEA = .080; CFI =.87; gamma hat = .82) and was non-equivalent between groups (ΔCFI > .01). Hence, we conclude that there is only metric equivalence in the best fitting two-group model between New Zealand and Brazil student responses to the SCoA.

Discussion
The best model discovered in this study (i.e., Model 8: Bifactor #5 containing a general factor predicting all items plus three factors and three pairs of covarying residuals) adds to our understanding of previous data analyses of the SCoA inventory. The bifactor model seems to have identified correctly, as did previous research (Weekers, Brown, & Veldkamp, 2009), that there is a general latent trait accounting for substantial covariance among the SCoA items. Clearly, when thinking about the purposes of the items, there is a common latent trait driving responses, perhaps the reason why the Improvement, External, and Affect factors are positively correlated and all inversely correlated with Irrelevance. The presence of three domain-specific factors reinforces the claim that the purposes do have additional meaning to the general function of assessment, strengthening the claim that the SCoA is multi-dimensional. Hence, this study advances our understanding of the SCoA dimensionality.
While the introduction of correlated residuals has been given warrant (Byrne, 2001) this is a step that ought to be taken cautiously since it rests on the presumption that the unexplained variance of one item systematically covaries with the unexplained variance of another item, but not with other items. A more cautious approach considers that unexplained variance  has a zero relationship with all other error variances. Nonetheless, the three pairs of error covariances in the preferred model do not appear completely random. The three pairs were: • Pair 1. Items 8 and 9 from Assessment Improves Teaching; • Pair 2. Items 6 and 31 from Assessment is Enjoyable; and • Pair 3. Items 24 and 11 from Assessment Evaluates School Quality.
It is clear that the pairs of items came from matching SCoA factors and had very similar wording. This suggests that either there were insufficient items to detect the intended factor or else the items function as 'bloated specifics' artificially creating a scale because of repeated wording (Kline, 1994). Since, Model 1 with eight specific factors had acceptable fit for each group separately, it is likely that the factors do exist and have simply been insufficiently operationalized with the introduction of the bifactor approach. This suggests future studies should attempt to create more items around these three constructs to ensure that the specific contribution the factors make can be detected even after the shared general factor is introduced. It also probable that greater specificity in these constructs would improve invariance analysis results.
This research adds to our understanding of the cross-cultural properties of the SCoA because a previous two-country comparison of the same data sets (i.e., Brazil and New Zealand) (Matos & Brown, 2015) showed lack of configural and metric invariance. The additional introduction of the bifactor combined with the restructuring of the unique factors in the SCoA and the introduction of three pairs of correlated residuals resulted in metric equivalence between countries. This indicates that, while starting values are different for Brazilian and New Zealand students, the regression slopes between the latent traits and the items differ only by chance. This suggests that the SCoA inventory may have cross-cultural validity between countries with quite different higher education arrangements, perhaps because of the fundamentally similar relationship assessment plays in higher education (i.e., it evaluates student learning).
It seems reasonably safe to conclude that any difference in factor means and inter-factor correlations between the New Zealand and Brazil samples is a function of differences in populations and environments rather than deficiencies in estimation method or model specification. The common model across samples fits equally well and is partially invariant; hence, the differences are best explained by reference to different ecologies rather than deficient measurement.