Systematic mapping of literature on teacher evaluation ( 2013-2017 ) 1

International research has shown that it is possible to maintain quality education without external educational evaluations; however, evaluation is increasingly being considered by educational systems as a support for teacher professionalization. The objective of this article is to characterize the scientific production on teaching evaluation published in the five-year period between 2013 and 2017, based on the inclusion criteria of the Scopus and Web of Science databases and thematic and duplicate exclusion criteria. Through the methodology of systematic literature mapping, 106 articles were found, and the the main languages and countries in which the scientific production on teacher evaluation has been produced, their accessibility and types of work produced were identified. The topics of greater impact were identified and classified according to consolidated and emerging lines of research. It is concluded that there is an underdevelopment of lines of research that link teacher evaluation to the professionalization of teaching, and that the concern regarding the quality of evaluation processes and instruments remains current in scientific production. The systematic mapping of literature offers a careful selection of works of production on the subject and enables researchers and interested readers to trace precise paths of inquiry.


Introduction
International research shows that it is possible to maintain a quality education without third-party educational evaluations.The educational systems that grant more autonomy to schools place greater emphasis on internal evaluation processes; this implies greater attention to teacher training and professionalization schemes or their connection with evaluation procedures.An example is Finland, where the educational system not only has foregone external evaluation processes, but the topic is not even an issue.Also, in Western Europe, teacher evaluation is understood more as a process of reflection rather than a quality control system (MURILLO, 2007).
The purposes of teacher evaluation in each country are linked to traditions and the decentralization degree of their educational systems.In this sense, in countries where systems give more autonomy to schools, the purposes of teacher evaluation focus on attracting the best candidates to the teaching profession and keeping teachers motivated throughout their career.In Latin America, evaluation processes are more oriented towards the regulation of individual careers through promotions, salary increases, and raising the quality of teaching work (OECD, 2013a(OECD, , 2013b)).
The trend of teacher evaluation in an international framework, especially in centralized education systems, has had summative goals; in countries with greater autonomy, a greater orientation towards formative evaluation has been identified, in which the information retrieved is used for the professionalization of teachers (OECD, 2013a(OECD, , 2013b)).International organizations such as the Organization for Economic Cooperation and Development (OECD) have emphasized that the effectiveness of evaluations lies in the possibility of increasing the skills and competencies of those evaluated (OECD, 2013c).Reorienting the teaching evaluation toward formative rather than summative ends makes timely the exploration of previous experiences to identify alternatives for the evaluation of the teaching staff.
Production on teacher evaluation is extensive.A simple search of the words evaluation of teacher in specialized databases such as Scopus or Web of Science (WOS), for example, yields more than 25,000 documents that include this keyword in their titles, illustrating the volume of production.
In this framework, the general objective of this article is to characterize the scientific production on teacher evaluation published in the 2013-2017 period.The general research question is: what has scientific production been like regarding teacher evaluation in the last five years?Through mapping, five specific objectives are achieved: (1) identify production trends between 2013 and 2017, (2) identify the languages and countries in which scientific research has dealt with teacher evaluation, (3) characterize the production on teacher evaluation as to its accessibility, type of production and publication spaces, (4) identify the specific topic of scientific production on teacher evaluation in the international context, and (5) identify the works that have had the greatest impact between 2013 and 2017.The purpose of the work is to offer a navigation map that would allow the reader interested in the topic of teacher evaluation to know a set of selected publications based on criteria and select the most appropriate routes for their inquiry interests.
The present work consists of four additional sections to this introduction.The second section exposes the main difficulties that educational research has identified in the processes of teacher evaluation.The third describes the method of systematic mapping of literature and presents it as an alternative for selection of works and exploration of a large number of publications.The methodological phases and their steps are described.The fourth section contains the results of the mapping.These are organized according to the five specific questions that guided the extraction of information from the databases.Each subsection corresponds to a research question (RQ).The fifth section presents the conclusions.

The teaching evaluation
Reviewing evaluation models in different educational systems allows us to identify that teacher evaluation has a greater tradition in teacher accreditation processes and their abilities to make decisions regarding promotion, salary increase or even permanence in the post (MURILLO, 2007).Teacher evaluation has become one of the main mechanisms for regulating the career and professionalization of teachers.Both the educational policy and the research in the field have discovered different difficulties in the process.Mateo (2000) identifies as main difficulties of teacher evaluation (1) the conceptual definition of criteria used for teacher evaluation, (2) the technical quality of measurement instruments, (3) the insertion of evaluation between processes of educational systems and their policies, (4) the definition of a legal and regulatory framework that legitimizes processes, makes them official and guarantees rights fulfillment, (5) the installation of evaluation cultures for improvement and ( 6) the protection of the information and honor of those evaluated.
These approaches have also been addressed by different researchers around the world, who, from exploratory or critical positions, have made different recommendations that can contribute to the solution of the difficulties.Among the main recommendations found are the following.
a) The search for congruence between the theoretical approaches that underlie teaching practice and the evaluation model in which teachers participate (GOODWIN; WEBB, 2014).b) The definition of criteria that help clarify and standardize what it means to be a good teacher and what is a good teaching practice (LOONEY, 2001; BADRTDINOV; GOROBETS, 2016).c) The construction of quality instruments and the assurance of equitable and controlled processes (PELLEGRINO; DIBELLO; GOLDMAN, 2016).d) The use of results for decision-making only when quality criteria are guaranteed both in the instruments, as well as in evaluation procedures and processes (WARRING, 2015).e) The integration of evaluation, training and professionalization processes to perfect the practice (ÁVALOS, 2007;VAILLANT, 2008).
The considerations made by experts invite us to consider that evaluation processes must be examined and retooled to promote a more formative approach that articulates goals, theoretical approaches, technical quality and use of the results (MURILLO, 2007).

Systematic mapping of literature
The objective was to carry out a systematic mapping of scientific production on teacher evaluation.Systematic mapping is a particular type of literature review; it is considered a secondary type study (KITCHENHAM; CHARTERS, 2007), used to identify, evaluate and synthesize research, mainly of a primary nature, although not exclusive of other types of publications, with the aim of answering questions previously raised to guide the review (SINOARA; ANTUNES; REZENDE, 2017).The mapping can be a study in itself, or be part of an early stage of a systematic review of literature; in this case, the mapping will be carried out as a first phase, applied as a search and selection strategy.
To define the methodological route of this study, the proposals of Petersen et al. ( 2008) and Sinoara, Antunes and Oliveira (2017) were adopted.The design was structured in six steps organized into four methodological phases.Figure 1 schematizes the mapping process.Subsequently, each of the phases is detailed.

Phase 1: Definition
In the first phase, the problem was defined through the statement of research questions that would guide the subsequent phases, from the search to the analysis of the information.The questions were formulated in such a way that they would make possible the navigation in the universe of the production in the subject.Five guiding questions were formulated; these are detailed in Table 1.

Question
Information needed

RQ1
What production trends are observed in the period between 2013 and 2017 on the topic of teacher evaluation?Increase or decrease in production over the years.

RQ2
In which languages and in which countries is research on teacher evaluation produced? Languages. Countries.

RQ3
How is the production on teacher evaluation, regarding the type of work and accessibility?
Open or closed access.Types of documents: articles, books, chapters, others.

RQ4
What works have had the greatest impact on the scientific production on teacher evaluation?Number of citations.

RQ5
What lines have been developed in the research on teacher evaluation?Specific topics of the production Source: Authors.

Phase 2: Locating scientific production
Searches of scientific production were conducted through the Scopus and Web of Science (WOS) databases.Two types of search queries were carried out: (1) Pilot searches.Terms were entered into the database and the type of documents retrieved was observed.The terms used were evaluation, assessment, teacher and teaching and search terms were tested combining them with the Boolean tools AND and OR.This search allowed to define the definitive descriptors.
(2) Final search.Table 2 shows the search terms used to locate the production to be analyzed.

Table 2-Terms used in the final search query
(TITLE ("evaluation of teacher") OR TITLE ("evaluation of teaching") OR TITLE ("Assessment of teacher") OR TITLE ("Assessment of teaching") AND TITLE-ABS-KEY (education) Source: Authors.
An additional step was the selection of exclusion and inclusion search criteria to refine the results.The search queries were made as equivalent as possible in the two consulted indexes.Table 3 specifies the refinement criteria for each database.The pre-analysis involved the first interactions with the documents.The titles and summaries of each text were read, assessing the relevance of each document identified in the search and selecting them in the refinement.Inclusion and exclusion criteria were established.Items discarded were: 1. Documents that were duplicated in WOS. 2. WOS and SCOPUS works without available abstract.3. WOS and SCOPUS documents that were not relevant to the field of teacher evaluation because they refer to student evaluation, carried out by teachers.Table 4 shows the number of discarded documents.

Duplicated documents 19
Documents without available abstract 10
The result of the search and application of the refinement and selection codes of documents are summarized in Figure 2. Once the documents were selected, a database was prepared with the metadata provided by Scopus and WOS.Title, authors and abstract of each document were entered.Source information was included: database, year of publication, journal, country and language.Information about the characteristics of the document also included: number of times each document has been cited, type of document, pages, accessibility, volume and journal issue.Finally, a unique identification number was assigned to each document (see Mapped production section).The analysis was done in two stages.In the first, the trend of scientific production was analyzed by date, language, country, type of publication and other characteristics that were established in the questions that guided the review.In the second, the content of the abstracts was reviewed and classified, in order to identify the lines of research that have been consolidated in the scientific research on teacher evaluation.The lines were established inductively from the summaries of the selected documents.The registration of the classification made was entered into the database (see Mapped production section).

RQ1. Scientific research trends
An increase in scientific production can be observed since 2010; however, the present mapping only includes production between 2013 and 2017.When analyzing the Claudia Navarro CORONA; María Soledad Ramírez MONTOYA databases separately, it is observed that while in Scopus it is decreasing, in WOS the trend increases.Figure 3 shows the production in the period analyzed in both databases.English is the main language of production, with 85.23% of the documents identified.One of the works (1.14%) was also published in Croatian.The 13.64%, which represents twelve works, was done in Spanish, Portuguese, Turkish, Chinese and French.For one of the documents, the database did not show language information; however, it was found that the language of publication is also English.Table 5 specifies the language of each reviewed document.

RQ3. Characteristics of production on teacher evaluation
The majority of the specialized production, 71.59%, was carried out in the article format, while proceedings papers represent 11.36% of the production.Table 6 shows the documents according to their type.
Only 13.63% of the documents included in the mapping were published in open access journals.This percentage represents twelve documents.Table 7 specifies the works according to their availability.
3-All the data collected was taken from the database; except the classification of document 13, which wasn't included in the database, neither of Scopus, nor of WOS, and had to be inspected to obtain the classification data of document type and language.Total 88

Source: Authors
Ten open access journals were identified, representing 13.16% of the 76 that made up the sample.Seven journals are classified in three of the four quartiles of Scientific Journal Rankings (SJR: Q2, Q3 or Q4).The journals Profesorado and Conget Education have a greater number of publications on the topic of teacher evaluation.Table 8 lists the open access journals, their SJR classification and H index.

RQ4. Works with impact on the production
According to the SCOPUS and WOS databases, 68.2% of the works have not been cited in other scientific works.31.8% has been cited once, and 22.73% has been cited more than once (20).Table 9 shows the works and their citations.The document with the greatest impact on scientific production comes from Belgium [13]; however, Australia, the United Kingdom and the United States, provide a greater number of impact studies for the study of teacher evaluation.The relationship of countries and jobs with their number of citations is presented in Table 10.
Source: Authors

RQ5. Lines of research developed on the production on teacher evaluation
The topic of each document was analyzed to identify which lines of research have been explored in teacher evaluation.Two sets were identified: one consolidated, which included lines of research with over ten works, and an emergent one that groups lines with less than ten works.Table 11 specifies the sets and the lines.
Save for the line Effect of all of the lines had already been identified by 2013; as of 2017, they are still consolidating.Teaching and teacher evaluation, and evaluation proposals showed a marked increase since 2016.Figure 5 shows the increase in production.

Relationship of results and associated factors
There are 14 works that analyze associated factors with some form of impact on the results of the teacher evaluations made by the students; to mention a few, among these is the point at the school cycle in which the evaluation is carried out [43], the size of the groups [52], the duration of the evaluation [44], the characteristics of the students who evaluate [18], the results in learning [19,22,106] or the teacher's characteristics [2,27,32].

Perceptions on evaluation processes
It includes works on the perception of students [28,105] and of teachers on teacher evaluation from a student satisfaction approach [4,5,17,35,92,98].In this line we also identified a study that evaluates the opinions of teachers about on school failure; although the authors propose it as an evaluation, it was classified in this category because it is about teachers' opinions [91].

Quality of evaluations
The works analyze the quality of the evaluations concerning the application conditions and the psychometric characteristics of the instruments of evaluation of teachers by the students.Validity, reliability [6,13,30,49], bias [85] and the application of theories of client satisfaction to evaluation [8] are analyzed.

Evaluation proposals
A set of works proposes methods, techniques, tools and theories to evaluate teachers.The use of questionnaires [51,53], portfolios [56] and rubrics [75] are proposed as evaluation instruments.Methods incorporating networks for algorithmic calculations and evaluation [63] are included, that incorporate hierarchical process analysis, decision making [71], analysis of relationships and interactions between teachers and students by identifying gaps [89], observation [93], training for co-evaluation based on criteria [75], self-evaluation [3], comparison of scales for teacher evaluation [24] and evaluation of practices through data mining [64].From an evaluative perspective, the phenomenological [60] and evaluation with a formative approach [14], are included.

Emergent lines
Five emergent research lines on teaching evaluation were found.Interpretation of results reviews meanings teachers give to the results of evaluations and the effect on the social perception of the teachers from the results of the evaluations carried out by students.
Effect of evaluations analyzes the impact or effect of evaluations on various aspects; for example, on the improvement of the teaching practice [55,61] or the teachers' reflection [34].There were also works that analyzed evaluations carried out by superiors and the participations of the teachers on the evaluation process.

Conclusions
Mapping allows the reader to establish their own exploration routes according to their own research interests; this represents a general orientation in the literature selection process for a more in-depth review.However, this study is circumscribed by two elements.A first limitation is the units of analysis; since we worked with metadata and abstracts, it is not possible to offer details of the findings of each work carried out by the researchers.The second limitation is the source from which the analyzed works were obtained, because although Scopus and WOS are indexes that house the high-quality research production in the world, there is a wider universe of production on the evaluation of teachers, mainly in Latin America, where teacher evaluation systems are still under constant review and discussion by research communities.Likewise, the normative and educational policy documents disseminated by governments have been excluded from this analysis.
On the other hand, the main strength of the mapping methodology is the possibility of having a general panorama in an accessible format over a vast production.Thus, the contribution of the mapping is the ordered presentation of a panoramic view of the scientific production on teacher evaluation published in the 2013-2017 five-year period, in high-impact journals, which offers the interested reader a synthesis of the main global production.
Significant differences were found in the volume of production in English, in relation to other languages.There were also differences in the amount of production on the topic generated in the different countries of the world.The United States is the country that makes the largest number of contributions; however, countries like Belgium and Australia seem to have more impact on the number of citations in the scholarly scientific community of teacher evaluation.It must be pointed out that the three outstanding countries have educational systems with high autonomy in their operation (OECD, 2013a(OECD, , 2013b(OECD, , 2013c)).This data may indicate new study goals that could link together the interests on the topic with the ways of conducting teacher evaluation and the interests of research in specific contexts.
There is an underdevelopment of the lines of research that associate the evaluation of teachers with the training and professionalization of teaching or with the use given to the results of evaluations.Even works that present novel proposals for teacher evaluation do not seem to express an approach that clearly identifies with these purposes.
The scientific production on teacher evaluation in the period reviewed has focused on teacher evaluation from the perspective of the students, so there is a difference between the type of evaluation documented in the research and the evaluations carried out by the countries of America and Europe.The actions in the research in the field of evaluation seem to leave aside the recommendations of authors such as Warring (2015), Ávalos (2007) or Vaillant (2008), who point out the need to consider the results of evaluations only when they meet criteria of quality and reorient the evaluation for training purposes.
This mapping on teacher evaluation indicates that the difficulties identified by the scholars of the field are still valid in the development of evaluation processes.The topics identified in the literature classification confirm that the main difficulty of teacher evaluation continues to be the technical quality of the instruments and processes, which had already been expressed by Mateo (2000).This relationship between the literature and the analysis of researchers' interests indicates that quality remains an issue not yet overcome.
The present work represents, thus, an invitation to identify areas of opportunity in the teaching evaluation in specific fields of teaching work such as multimodal, distance, or b-learning environments, or massive open courses, and the study not only of the actors, but also of the very management and impact found around these processes.

Figure 3 -
Figure 3-Production trend in SCOPUS and WOS on teacher evaluation

Figure 4 -
Figure 4-Production on teacher evaluation by country

Figure 5 -
Figure 5-Development of lines of research during the January 2013-June 2017 period

Chart 1 -
Mapped production educational management.María Soledad Ramírez Montoya is a professor-researcher at the School of Humanities and Education of the Tecnológico de Monterrey.She is the coordinator of the Research and Innovation in Education Group of the Tecnológico de Monterrey, director of the office of International Council for Open and Distance Education (ICDE) and director of the Open Educational Movement for Latin America UNESCO Chair.

Table 3 -
Search refinement

Table 6 -
Documents by type

Table 7 -
Documents according to their availability

Table 8 -
Review of open access journals 4 Scientific Journal Rankings.H index.It is the index to determine the quality of journals according to the number of citations.The H index is the average of number of publications and their citations.

Table 10 -
List of countries and documents with the greatest impact, with number of citation

Table 11 -
Set of research lines on teacher evaluation