CONSTRUCTION AND STUDY OF VALIDITY EVIDENCE OF THE TEACHING ASSESSMENT SCALE

The objective of this study was to present the construction of the Teaching Assessment Scale (TAS) and the validity evidence for it. This scale was developed through a literature review and interviews with graduate students. The evidence for content validity of the TAS was evaluated by ten referees, and this was followed by the analysis of the evidence for construct validity and reliability. The tetrachoric correlation matrix was submitted for exploratory factorial analysis, and the Hull method was used to decide the number of dimensions to be retained. Item response theory (IRT) analysis was performed using the rating scale model with the result that seven items needed to be excluded. The Kaiser-Meyer-Olkin (KMO) index and Bartlett’s Test of Sphericity indicated that the polychoric correlation matrix was factorable. The Hull method suggested the retention of a factor, with the eigenvalue of 15.49. The factor’s reliability measures were α = 0.96 and ω = 0.95. As a result, the TAS is considered helpful in evaluating higher education teaching methods in Brazil.


INTRODUCTION
According to Korthagen (2004), the definition of a good teacher consists of a complex construct that involves both objective and subjective aspects.The author reports that, starting in the 1950s, there was a significant effort on the part of researchers to understand this construct vis-à-vis the teacher's skills.Therefore, studies were developed to correlate the teachers' behaviors to the students' learning, resulting in lists composed of behavioral indicators identified as the abilities of a good teacher.
In the 1970s, researchers started to base their studies on a new paradigm called Humanistic Based Teacher Education (HBTE).This approach assumed that teachers had personal characteristics capable of influencing their professional conduct, apart from the isolated role of their competencies (Zeichner, 1983).However, HBTE met with resistance from many specialists, and both approaches have been targets of debate and controversy up to today.Korthagen (2004), with the aim of understanding what defines a good teacher in a more holistic way, proposes a model involving different components: environment, behavior, competencies, beliefs, identity, and purpose.Environment encompasses aspects such as the classroom and the students, while behavior refers to the conduct of the teacher himself as contrasted with, for example, his activities and his students.Competencies cover, among other aspects, the knowledge and skills of teachers in relation to their teaching technique.These competencies are directly influenced by the teacher's beliefs (the fourth component), more specifically those related to the teaching-learning process.Identity refers to how teachers understand their role as professionals and is directly related to purpose, which involves their reasons for being a teacher, and to the inspiration for pursuing the occupation.According to Korthagen (2004), basing the definition of a good teacher on this model permits to understand this construct in all its complexity, without ignoring fundamental aspects that influence the quality of professional performance, such as beliefs and identity.In this manner, starting with a broad conceptualization of the characteristics of a good teacher, three dimensions which seek to synthesize the levels described by Korthagen (2004)  These dimensions underpin the construction of the Teaching Assessment Scale (TAS), intended for higher education teachers, which will be described ahead.
The "who a good teacher is" dimension primarily involves the notions of professional identity and self-concept (Korthagen, 2004).Self-concept can be understood as the personal significance of aptitudes, interests, values and choices, which jointly comprise the unique characteristics of individual in fulfilling their professional role (Super;Savickas;Super, 1996).When it comes to the teacher, professional identity is consequently understandable as a collection of self-concepts, that is, interests, values, goals, and personal history that are considered vocationally relevant (Korthagen, 2004).It deals, therefore, with a realm involving relationship skills, personality types, and emotional states.The following elements are situated in this dimension: enjoyment of what one does; valuing one's profession; engagement with the learning process; displaying professionalism; valuing the relationship with students; building a supportive environment; demonstrating unconditional respect and positive consideration for students; being open-minded; having a sense of professional identity; being empathic, understanding, and respectful with others; having sense of humor; and being flexible and responsible (Korthagen, 2004;Osmun;Copeland, 2011;Pachane, 2012;Sutkin et al., 2008).
Related to the "what a good teacher knows" dimension, Shulman (1987) notes that the teacher's knowledge is characterized by area of expertise as much as by content, knowledge of the curriculum, and knowledge of educational contexts.According to Harden and Crosby (2000), the teacher is responsible for transmitting new and relevant knowledge in an appropriate manner.The authors, when theorizing about the roles played by the teacher, stress that, in the process of transmitting information, good teachers should be capable of sharing personal thoughts and reflections with the students, making their line of reasoning and their particular view on the field clear.In this sense, they could, for example, present information that the student would not normally find in texts on the subject, as well as draw links between the content, the curriculum, and other aspects of the subject area in a critical and reflective manner.It needs to be stressed, therefore, that good teachers ought to have knowledge of discipline's content (its substance, logic, form, and epistemology) and the ability to reflect deeply upon that which they propose to teach.Accordingly, one can understand this dimension as involving attributes related to the realms of content, experience, practical and theoretic bases, continuing education, and practical and theoretical revision (Azer, 2005;Feitoza;Cornelsen;Valente, 2007;Pachane, 2012).
Finally, the third and last dimension is called "how a good teacher works".Regarding the word work in this sense, Reed (1989) asserted that it is not so much the content that is important as its preparation and presentation.To the author, the primary purpose of a class should be to provide pertinent materials, presented in a stimulating manner and designed to facilitate comprehension.Concurrent with this assertion, among the techniques used by the good teacher, studies cite: using exhaustive questioning to engage students in discussion; knowing how to deliver the right amount of supervision and independence; developing a supportive relationship with students; emphasizing solutions to problems; giving feedback; being clear in presenting content; having control of the classroom; knowing how to relate theory to practice; using appropriate language; stimulating high-level critical thinking skills; presenting difficult concepts in an understandable form; employing diverse teaching strategies; and stressing teamwork and collaborative learning (Azer, 2005;Feitoza;Cornelsen;Valente, 2007;Osmun;Copeland, 2011;Pachane, 2012).
The good teacher has been seen as a professional capable of motivating and engaging students regarding academic tasks and respecting their learning styles, above and beyond presenting themes in a clear, organized way.In the teacher's own perception, it is necessary to create a good plan and be well-prepared for class, ally practice with theory, be organized and communicative, like challenges, and be a motivator and conductor of the students' learning (Samples;Copeland, 2013;Duarte, 2013).Thus, keeping in mind the variability and complexity of the aspects of the before-mentioned dimensions, it is important to introduce psychometrically valid metrics capable of measuring that construct.To this end, Avrichir and Dewes (2006) developed an instrument for self-assessment of teacher performance for graduate school professors.It consists of a scale intended to evaluate two factors: 1. teacher interest and challenge; 2. relationship with the student and evaluation.
Regarding to teacher efficacy, Tschannen-Moran and Hoy ( 2001) have developed the Ohio State Teacher Efficacy Scale instrument, composed of 24 items divided into three factors, which encompasses: 1. instructional strategy efficacy; 2. classroom management efficacy; 3. student engagement efficacy.
Harden and Crosby ( 2000), for their part, presented a model of the functions performed by teachers of medicine and an instrument for evaluating them.The authors emphasized that there are 12 primary functions involved in the teaching of medicine and developed a questionnaire to evaluate the perception of teachers concerning the importance of these functions and of their commitment to each of them.Also in the area of health, Stone et al. (2002) conducted semi-structured interviews with U.S. medical school teachers, from which emerged the following factors: 1. underlying humanitarianism; 2. familiarity with educational principles and practices; 3. appreciation of both the advantages and disadvantages of teaching; 4. self-image as a teacher.
By means of a qualitative case study approach, data from a two-year university developmental program was explored and analyzed with the result that factors surrounding teacher education fell into three groups: 1. personal (cognitive and emotional factors unique to the individual); 2. relational (connections and interactions with others); 3. contextual (the program itself and external work environments) (Lieff et al., 2012).
None of the instruments already cited has been adapted to or is valid for use in the Brazilian context.In Brazil, the use of instruments for teacher evaluation and self-assessment in teaching institutions, particularly in those for higher education, does occur.However, these instruments are generally devised by the institutions themselves and have not usually been studied psychometrically before being used.It is likely that many of these instruments were created in response to the 2004 creation of the National System for Higher Education Evaluation (SINAES, or in Portuguese Sistema Nacional de Avaliação da Educação Superior).SINAES proposed an institutional evaluation composed of several complementary instruments and included the implementation of a system of self-assessment to be undertaken within the institutions themselves via Internal Evaluation Committees.Despite following a set script, the institutional self-assessment is implemented through distinct instruments to which the higher education institutions (HEI) can add items they judge pertinent (Ristoff;Giolo, 2006).One of the most common and constant components of the self-assessment processes is done by the student on teaching work (Gomes;Borges, 2008).At the national level, teaching evaluation lacks instruments that consider the different aspects of the construct for evaluating the teacher's practice.Besides this, it is important to develop an instrument that specifically attempts to evaluate graduate-level educators' teaching practices in the Brazilian context, given the operating peculiarities and specifics in the field.
Consequently, the objective of this study was to present the construction of the instrument called the TAS -Professor Version, along with evidence for its validity.

METHOD CONSTRUCTION OF THE TEACHING ASSESSMENT SCALE -PROFESSOR VERSION
To construct the scale, a literature review starting with the definitions of the three dimensions of the instrument -what a good teacher is, what a good teacher knows, and how a good teacher works -was carried out.Additionally, interviews with graduate students were conducted to survey how they define a good teacher.Students from various majors at public and private institutions in three Brazilian states (Rio Grande do Sul, Minas Gerais and Ceará) were contacted to be interviewed.The interviews took place in person or via e-mail, and the students were asked to describe a good graduate school teacher, considering their skills and competencies.Twenty-two graduate students participated in this stage.The number of interviews was determined according to the saturation criterion, that is, the interviews stopped when the students' descriptions of good teachers began to repeat (Fontanella;Ricas;Turato, 2008).
The students' responses pertaining to the characteristics of a good teacher were analyzed and categorized based on a qualitative content analysis (Bardin, 1977), thereby generating a series of items that composed the scale.The items were presented to a focus group of 11 graduate students.The group analyzed the items, adapting them to the criteria for item preparation proposed by Pasquali (2010).Items with highly similar content were excluded.In the end, 71 items, distributed among the three theoretical dimensions, were selected.The response options for the TAS were defined on a five-level Likert scale, in which 1 represents "never"/"not at all" and 5 "always"/"completely", according to how well the professor embodied the statements posed.At the end of this process, the instrument was dubbed the TAS -Professor Version.
The completed scale was submitted for evaluation to ten judges for objective assessment of evidence for content validity, as suggested in Pasquali (1998).All the judges were teachers with graduate school teaching experience (from 2 to 25 years); some had a background in psychometry and others in education.The judges received both the dimension definitions and the dimension-sorted items.They were asked to evaluate the items' comprehensibility and relevance and pertinence to the assigned dimension.Space was also provided for them to note any writing problems or suggest the removal of items or the addition of new items.
The results of the judges' analyses were discussed by the authors, and items were modified or excluded as necessary.At the end of this stage, 57 items remained in the TAS.A pilot study was carried out to assess the preliminary characteristics of this version of the instrument.Thirty-two private and public university professors from 30 graduate programs with an average of 10.85 years' time teaching (standard deviation (SD) = 10.59)participated.Using the analysis of the responses from the pilot study, four items highly correlated (over 0.60) with other items were removed as they were considered redundant.The version of the TAS resulting from the pilot study thus contained 53 items.

EXAMINATION OF THE EVIDENCE FOR CONSTRUCT VALIDITY AND TRUSTWORTHINESS
For the analysis of the evidence for construct validity and trustworthiness of the TAS, 194 professors active in graduate programs in different regions of Brazil, particularly in the South (63.9%) and Southeast (26.8%), responded.Chart 1 summarizes the primary descriptive characteristics of the sample.
The majority of professors were female (61.9%), with average age of 41.88 years (SD=11.9)and average time active in higher education of 12 years (SD = 10.2).The most common degree attained was doctorate (48.1%).The majority of professors was active in one or more graduate programs (primarily Psychology, Engineering and Administration), predominantly in private universities (53.6%), and the greater part was employed full time (59.3%).

PROCEDURES
This project was submitted and approved by Committee for Ethics in Research of the Institute of Psychology at the Universidade Federal do Rio Grande do Sul (UFRGS) (no. 450,393, 4 November 2013).The version of the TAS resulting from our construction process, along with a sociodemographic data questionnaire, was made available online via SurveyMonkey ® .Those invited to participate were contacted by e-mail or social networks.The participants had to confirm their participation by accepting the Informed Consent Term before filling out the scale.

DATA ANALYSIS
Given the ordinal nature of the responses, exploratory factorial analysis using the unweighted least squares (ULS) and mean and variance adjusted weighted least squares (WLS) (Muthén; Du Toit; Spisic, 1997) extraction methods were carried out on the polychoric correlation matrix (Holgado-Tello et al., 2010).The Hull method (Lorenzo-Seva, Timmerman, Kiers, 2011) was used to determine the number of dimensions to be retained for analysis.This method contrasts the degrees of freedom with the adjustment indices of several possible solutions for the same matrix, and the factorial solution with the best balance between both parameters is retained.Following this, item response theory (IRT) analyses were undertaken using the rating scale model of Andrich (1978), adapted for polytomic items.The model independently estimates the difficulty of the items (δ) and the position of individuals on the estimated linear latent continuum (θ) in log-odds units (logits).The adjustments of items to the measurement model (infit and outfit), the dimensionality of the measurement residuals (principle contrasts), and the location dependence of the items (residual correlations) were estimated.

RESULTS
The Kaiser-Meyer-Olkin (KMO) index was 0.89, and Bartlett's test (4993.2,degree of freedom (df )=1378, p <0.001) indicated that the polychoric matrix correlation of the items was factorable.The Hull method suggested retaining a factor, with the eigenvalue of 15.49.The one-dimensional model gave an adjustment of χ 2 =2874.12(1325), p<0.001, using the ULS method, and χ 2 =2506.786 Chart 1 -Descriptive data from the teacher sample (n = 194).(1325), p<0.001, using WLS method.The reliability measurements for the factor were α=0.96 and ω=0.95.The factorial loads of the items were all suitable, being above 0.32.Subsequently, the same set of items was analyzed using the rating scale model.The items' adjustment to the measurement model was analyzed using the infit and outfit values.These indices pertain to the residuals, and should always lie between 0.5 and 1.5.The closer to the unit, the better the adjustment between the level of difficulty of the item and level of the individuals' latent trait which either endorses or not the item in question.Chart 2 presents the results of this analysis.In it, the items are ordered by difficulty parameter, and the infit and outfit values are presented.The reliability measurement of the items estimated by the rating scale model was 0.96.

Variable
The dimensionality was obtained by means of principle contrasts in which the residual variance of the items is submitted to principle component analysis.The presence of contrasts with eigenvalues greater than or equal to 2 indicates possible dimensions influencing the response patterns beyond measurement.Finally, the residual correlations of the items were estimated to investigate local dependence cases between them.High residual correlations indicate that a particular endorsed item's probability is not determined solely as a function of a latent trait, but depends on the response of another item.According to these criteria, seven items were excluded (5, 9, 14, 28, 29, 46 and 49).The final version of the scale, with 46 items, is presented in Appendix 1.

DISCUSSION AND FINAL CONSIDERATIONS
From this study, a Brazilian instrument for self-assessment of teaching practice for graduate school teachers was constructed.The instrument may be useful for teachers in monitoring the activities that teaching demands, as well as for reflection and improvement of one's own practice.With its basis in literature, the conceptions of students regarding the practices of a good teacher, and the judgments of specialists, the TAS -Professor Version has potential to be a valuable instrument in the evaluation of higher education practices in Brazil.Darling-Hammond (2010) states that the participation of teachers in the process of evaluating their own performance can reflect on the effectiveness of their teaching.This is because the evaluation is a diagnostic process which can form the basis for necessary changes in the actions of, as well as in the training of, these professionals.In addition, in the Center for American Progress' program Evaluating Teacher Effectiveness: How Teacher Performance Assessments Can Measure and Improve Teaching, the author suggests teachers who continually evaluate their own performance tend to adopt new and more effective teaching methodologies .This view of evaluation aligns with the one already recommended by Marques (2010).For that author, evaluation is a process of constant monitoring and improvement in the quality of higher education.Thus, there is a need for good instruments, such as the TAS, which enable systematic evaluation of teaching practice.
Additionally, it should be noted that there is a scarcity of studies of teaching evaluation instruments for the Brazilian context that takes into account this construct's various facets.On the other hand, one must recognize that many of the country's universities are developing their own measurements for this end, many of them evaluating the performance of teachers based on the perception of their students (Gomes;Borges, 2008).The information coming from these measurements are important for institutional monitoring and the formulation of specific programs aimed at improving the quality of instructions.However, these instruments are generally not being studied psychometrically and, thus, their quality is questionable.In this sense, it is understood that the TAS, having been constructed based on both theory and empirical data and respecting all the necessary psychometric procedures for the creation and validation of such an instrument (Pasquali, 1998;Urbina, 2007), can be useful for evaluating higher education teaching practice.After searching the Brazilian literature, it was determined that this is the first self-assessment instrument for Brazilian teaching created according to the before-cited standards.
In the process of construction, principles for creating such instruments based on those proposed by Pasquali (1999;2010) were followed.This author recommends three procedure types be followed: theoretical, empirical, and analytical.Theoretical procedures include literature reviews, surveying the empirical evidence on the conception of a good teacher and creation of the instrument items.This step unfolds in the empirical procedures, which consist of the pilot study and data collection for analysis of the validity evidence.The utilization of statistical analyses for confirming the psychometric properties of the TAS corresponded to the analytical procedures and completed the construction process.Thus, TAS -Professor Version was constructed with the theoretical and methodological rigor recommended by the literature for creating such instruments was confirmed.The scale possesses psychometric qualities tested and approved for self-assessment of graduate school teaching practice and is in accord with international standards for educational and psychological testing (AERA, APA, NCME, 1999).
The analyses suggest the instrument has a one-dimensional structure, in contrast with the theoretical model of three dimensions adopted for its construction.One can imagine that the three theoretical dimensions might be difficult to separate empirically, and that, for example, a teacher's flexible personal style will be associated with his way of managing possible modifications to the lesson plan.Hence, the dimensions dynamically influence each other and compose distinct patterns of behavior.The parameter analysis of the item difficulty permitted establishing groups of items with similar contents, reflecting different levels of teaching ability.For example, in Chart 2, the easier items (item sequence 13 to 22) have their roots in the broader and more formal aspects, almost bureaucratic, of the function of teaching.They involve basic ways of being, doing, and knowing for the role, and are the minimum one would expect from a teacher.On the second level (item sequence 31 to 50), there are the skills involved in mastering, monitoring, and control of the teaching process.Relational aspects, such as incentive, feedback, and fostering a critical stance are integrated into a deeper level of the content being worked.Finally (the sequence from 19 to 3), in the last stage, are the advanced characteristics of a teacher, which demand a high level of adaptability and handling of students' demands, however without missing the clear benchmarks for evaluating the process.Given the one-dimensionality of the TAS discovered in the present sample -in contrast to theories positing higher dimensionality -, new studies with broader samples should be undertaken to confirm the factorial structure.Beyond this, studies investigating other evidence for the validity of the instrument based on external criteria, such as the judgment of students concerning the competencies of their teachers, are recommended.With this in mind, a version of the TAS for students is already being constructed.Researches comparing the practice of teachers of different subject areas who operate in both public and private sectors will be able to gather useful information on which to base changes in the training and professional operation of higher education teachers in Brazil.
We believe that the present study contributes to the advance of knowledge and to the evaluation of teaching practice in Brazil as it presents a specific and valid instrument for this end.It is hoped that its use can have an impact on four levels: individual, institutional, political, and theoretical-scientific.In individual terms, teachers will be able to evaluate their skills and competencies, and reflect on aspects to be improved in their work.At an institutional level, the HEI can use the results of the TAS, along with the results of other instruments, to implement strategic changes in the process of teaching.On the level of educational policy, it is hoped that this instrument can contribute to the SINAES, by helping to standardize the recording of information concerning teaching practice.Finally, from the theoretical-scientific point of view, the set of information produced by this and future researches could help rethink the theoretical definition of the good teacher.Chart 2 -Continuation.
(Follow attached a free translation of the TAS -Professor Version.The only appropriate version of TAS for use is the original in Portuguese.To an adequate use of it in another language, we strongly suggest a process of cultural adaptation.For more information about it, please read Borsa, Damasio and Bandeira, 2012.For any use of TAS, please contact the first or the last author of this manuscript).This is an instrument for reflecting on your teaching activity in higher education.With it, you will have the opportunity to identify and track your teaching competencies, and improve or develop them.In order for this process to be as effective as possible, be sincere in your responses and think carefully about your teaching work in one college-level discipline.Discipline: _______________________________________________________ Respond to each item according to how well it applies to you using the 1 to 5 scale below: guide students' search for materials to complement the content worked in class 1 keep myself up-to-date on what I am teaching 1 2 3 4 5 25.I give negative feedback to my students when it's necessary 1 2 3 4 5