Classroom observation in context: an exploratory study in secondary schools from Northern Colombia

This paper discusses the role of classroom observations in informing debates on the quality of teaching and learning in secondary education. Specifically, the document proposes a methodology for classroom observation in context (CoC) to address many of the epistemological limitations of mainstream input-output observation models in relation to the professionalisation of educators. To observe in context entails working with a non-structured observation strategy to identify patterns in classroom events and the subsequent opening of spaces for collaborative dialogues (among observers and between observes and observees) to reflect about the potential mechanisms behind these patterns. The results of an exploratory study of CoC in Northern Colombia indicate the potential of such a strategy in informing education policy debates beyond the classroom setting.


INTRODUCTION
Since the publication of the McKinsey reports (MR) in 2007 and 2010, education reforms around the world placed their focus on the quality of teachers (Coffield, 2012), with an emphasis on the use of data to manage their performance (Aud and Morris, 2014). The concept of good teaching is hence "reconfigured within a new scientificity as a clinical practice of standardised knowledge, and prescriptive knower dispositions" (Mooney, Moles and O'Grady, 2016, p. 1). Scholars like Coffield (2012) are, nonetheless, pessimistic that such policy frameworks will contribute to improving school systems, given the problematic assumptions behind that narrative. For him "the authors [of the MR] adhere to the acquisition model of learning, where the minds of learners are viewed as containers to be filled with knowledge" (Coffield, 2012, p. 140). Evidence, he further argues, suggests that (at least for the case of England) most of the differences in the performance of students in exams is explained by their background, and not by the ability of instructors to deliver knowledge. 1 Consequently, he contends, "the belief in one right approach to teaching needs to be rejected" (Coffield, 2012).
To suggest that policymakers should reconsider their stereotypes about the performative nature of teaching 2 does not preclude the importance of reflecting on ways of improving the professionalisation of educators (O'Leary, 2012). However, research in the field suggests that the lack of a critical assessment of those stereotypes has resulted in policies of professionalisation that contribute to "a decline in the creativity and innovation of teachers' work in the classroom" (O'Leary, 2012, p. 807). A core argument of this article is that such a situation is the reflection of a longstanding belief of a sharp distinction between the aims of research-based knowledge "that is published in scientific journals" and pedagogical knowledge, "which is used by classroom teachers in their day-to-day teaching" (Vanderlinde and Van Braak, 2010, p. 301). It is rather common to find that "[p]ractitioners make little (appropriate) use of educational research", as they consider it "limited in practical use" (Vanderlinde and Van Braak, 2010, p. 302). Such a contrast in people's beliefs might help in understanding the insistence of policy-makers in pursuing education policy reforms that, according to academic evidence, are ill-equipped to foster real improvements in teaching and learning.
This article focuses on the debate on classroom observation as a case in point of the paradoxes mentioned in the last paragraph. Mainstream approaches to the topic tend to overemphasise in measuring the skills of teachers to maximise productivity in the use of class time while leaving aside reflections about the causes of certain classroom dynamics. Questions about causal-drivers, which some commentators would associate more with academic research, are often ruled out from the debate, diminishing with it; hence, the possibilities to identify education policy initiatives that can foster educational change in different teaching and 1 García-Villegas et al., 2013 discuss similar results in the Colombian case. 2 By perfomativity, Ball (2003, p. 216) refers to "a culture and a mode of regulation that employs judgements, comparisons and displays as means of incentive, control, attrition and change based on rewards and sanctions (both material and symbolic)".
learning contexts (O'Sullivan, 2006). The first two sections of the article delve into some relevant elements of that debate by highlighting the epistemic tenets behind dominant approaches to classroom observations and by identifying some methodological requirements for them to contribute to solving their limitations. Section three profits from Wragg's (1999) and O'Leary's (2013O'Leary's ( , 2014a notions of classroom observation in context (CoC) to delineate an observation strategy that reduces the gap between the methodological considerations of academic research in education and the demands of policymakers and practitioners for (certain) practicality in the applicability of observation tools to discuss about day to day dynamics at schools. Finally, the last section of the article revises some empirical outcomes of this proposal with observation data retrieved in three schools from Northern Colombia.

WHY OBSERVE A CLASSROOM?
Classroom observation represents, at least in mainstream policy circles, a mean of measuring productivity in education spending through the assessment of teaching practices. While one might identify nuances in the approaches used across educational systems in the planet, "[one] of the underpinning issues traversing the different contexts and purposes of observation in schools is the notion of teacher effectiveness" (O'Leary, 2012, p. 793). A recent report of the World Bank Group, for example, recommended focussing on observational outcomes such as the teachers' use of instructional time and materials, the teachers' core pedagogical practices and the teachers' ability to keep students engaged (Bruns et al., 2015). Such type of framework is dominant in classroom observation studies (Halpin and Kieffer, 2015) and implies working under the assumption that the "quality of an education system cannot exceed the quality of its teachers" (Barber and Mourshed, 2007, p. 13;Coffield, 2012).
Relevant for this literature is the debate on whether such types of observational standards represent a good approximation to study concepts such as the quality of teaching and learning in schools and its causes. For instance, scholars argue that the overemphasises placed in the causal assumptions supporting the teachers' effectiveness movement, "fail[s] to consider many variables often beyond the control of the teacher that can affect students' performance in any given lesson" (O'Leary, 2012, p. 793). O'Sullivan's (2006) discussion on the misuse of lesson observation to inform education policy debates, beyond the individual learning of educators on methods and techniques, points out the way such practices end up omitting valuable information to foster those changes that the mainstream view expects to take place: The use of lesson observation is not an innovative approach, yet, the literature suggests that it is rarely used in research and evaluation studies which seek to improve and assess quality in developing countries, and even more rarely […] to inform policy or in implementation efforts […]. Lesson observation can answer the "what" questions and illuminates the "how" questions, i.e. what is the current state of educational quality in schools and how can it be realistically improved with the available resources. It can also provide some insights into the why questions -why is the quality of education poor? (O'Sullivan, 2006, p. 253-254) To achieve the goal of addressing such concerns -and in the extend to which she refers to -O' Sullivan (2006) points out the need to use complementary data to the one retrieved during the observation process, most notably from interviews with teachers. That it is important, she argues, "to more fully understand the teaching and learning processes currently being used and the extent to which particular processes are likely to be implemented" (O'Sullivan, 2006, p. 254). From that point of view, following O'Leary's (2012, p. 865) reading of Ted Wragg's classic introduction to the topic, observation "becomes part of a 'collection of data', thus recognising that it is one of several sources of evidence in the formal appraisal process". Epistemologically speaking, such an argument elaborates upon the complexities of the structure and agency debate in educational research (Clegg, 2005;Parra, 2017) and against the theoretical determinism that validates the teacher effectiveness -as a single and universal determinant -stand in the debate. O'Leary (2012, p. 808) presents such a dilemma in the following way: [a]t the heart of these contestations lies a conflict between "structure" and "agency" and related notions of power and control, which manifests itself in the sometimes paradoxical agendas of policy-makers, the institution and its teaching staff. This conflict is epitomised by the way in which the developmental needs of staff and the requirements of performance management systems are forced to compete because they are often conflated into a "one-size-fits-all" model of observation in schools Delving into the implications of the structure and agency debate for educational research exceeds the scope of this document. This article endorses, however, the view that "complete explanations of social events and processes cannot be reduced to the intentions of agents without reference to structural properties or to structural forms without reference to the intentions and beliefs of agents" (Scott, 2007, p. 15). 3 Said differently, neither researchers nor policymakers can advance in identifying the causal drivers of, for instance, certain class dynamics (e.g., the learning of students) without referring to the properties of educational institutions and the way in which people react to those policy arrangements. The assumption behind that last statement is that human beings -or, in this case, teachers and students -are not passive agents that act only according to the behavioural guidelines that institutions create for them (e.g., providing incentives for good teachers). Precisely because human agents are reflexive beings (Archer, 2013;Parra, 2017;Scott, 2005), the identification of the causal reasons why some incentives have success and some others fail in encouraging certain behavioural traits must consider the way in which people effectively interpret and act upon those incentives.
The next section of the document makes an overview of the existing typologies of observation methods and rationales to inform the paper's proposal of a general methodology that addresses the shortcomings of dominant practices in classroom observation. 4 The discussion highlights the way in which mainstream approaches tend to privilege deterministic views of teaching and learning, given their strong assumptions regarding the passiveness of educational agents in the classroom. The strength of Wragg's (1999) and O'Leary's (2013, 2014a) CoC discussed later, lies, precisely, in the way it explicitly elaborates upon the reflexivity of educational actors as one of its core methodological tenets.

UNLEASHING THE POTENTIAL OF COC
O'Leary's (2006, 2012, 2013, 2014a, 2014b) extensive work in contrasting, both ontologically and methodologically, different approaches to classroom observation is a valuable departing point for the current analysis. For this author, despite some possible nuances, one can identify two major traditions in the specialised literature on the topic. The first one he associates with Gosling's performance-driven models (PDMs) of peer observation of teaching, which vary from the straightforward application of structured templates to grade educators to the opening up of spaces for teachers to reflect and participate in their professional development. In their most refined version of PDMs, according to O'Leary's (2012, p. 806) exposition, "tutors observe each other as part of a formative process […] [serving] the dual purpose of promoting the development of [the] observer and [the] observee". The distinctive feature of these models is, once again, the endorsement of the assumption of the existence of a deterministic (or linear) relationship between teachers' effectiveness and students' performance.
O'Leary's reiterates, however, how this mainstream approach to classroom observation "does little, if anything, to lead to an overall improvement in the standards and quality of […] teaching and learning" (O'Leary, 2006, p. 192). One underlying assumption of PDMs is that "the observed teacher is an account waiting to be filled with deposits of wisdom from the observer, thus suggesting an imbalance of power and expertise in the observer-observee relationship and the degree of autonomy afforded the latter" (O'Leary, 2012, p. 805). One problem with these restrictive approaches that place the summative outcome as the raison d'être of the observation, he writes, is that they conceptually disconnect the teacher's agency from the context from where teaching happens. That implies treating educational problems exclusively as personal problems of teachers as individuals (O'Leary, 2013). In line with the discussion of the last 4 It is relevant to mention that, while the literature on lesson study seems appealing to address the conceptual narrowness of the school effectiveness wave because it "adopts a longitudinal approach […] on collecting data" (O'Leary, 2012, p. 806), that typology is not included in the general overview. That is the case because, as Lynn et al. (2018, p. 8) contend, there exists "no universally held understanding of, or explanation for, the process of observation, how it should be conducted, and who or what should be the principal focus of attention". section, the fixation on what the teachers should do, more than in discussing the causal reasons why they cannot do otherwise makes PDMs ill-equipped to foster educational changes at schools. O'Leary associates a second tradition informing classroom observation techniques with Wragg's (1999) approach of CoC. Generally speaking, CoC is not a model, at least not in ordering or deterministic sense of input-output approaches to classroom observation, but rather a way of categorising an observation "into the contexts in which it occurred" (O'Leary, 2012, p. 803). Such an observation rationale privileges semi-structured observation schemes -rather than structured templates -to collect "raw material upon which reflection is based and from which ideas are then generated" (O'Leary, 2014a, p. 110). Wragg's (1999, p. 59-60) emphasis upon the logic of ethogenics carries, therefore, important epistemic value for the current study: A central feature of the ethogenic approach is the understanding of episodes in social life. "Episodes" are sequences of interlocking acts by individuals. It is the task of ethogenics to elucidate the underlying structures of such episodes by investigating the meanings actors bring to the constituent acts […]. Although ethogenics is another variant of the view that renounces quantitative methodology, its supporters endorse the notion of taking a scientific approach to classroom observation and the understanding of what happens there. It simply rejects positivists' use of quantities to achieve scientific rigour, favouring an alternative form which seeks to elicit underlying structure by careful qualitative of sequences of events, rather than impose it by predetermined schedules and other instruments of observation.
Different from some versions of PDMs, quantification in a CoC approach is useful not to measure standards, but to help researchers and practitioners identifying patterns that indicate the operation of potential underlying mechanisms affecting teacher-student dynamics in a classroom setting. One key aspect of Wragg's proposal is that it seeks to generate "'a genuine situation of experience' […] upon which [researchers and practitioners can] reflect since this [is] likely to lead to more meaningful reflection[s]" (O'Leary, 2014a, p. 110) about why certain classroom dynamics, and not others, occur. That logic implies re-examining, for instance, the relationship between observers and observees, and also the role of the latter ones as passive recipients of feedback from the formers, as the PDMs' scheme suggest. In O'Leary (2014a, p. 118-119) view: [t]he emphasis here is on the word "co-construction" as both have a collaborative and reciprocal role in constructing their personal knowledge and understanding of teaching and learning. This does not mean to say that the two will necessarily share the same interpretation of events observed but that the dialogic process of making sense of those events will be shared in a way that enhances personal meaning. Rather than one person controlling the acquisition and production of knowledge, a shared, dialogic approach necessitates the negotiation of meaning between the two parties, and it is during this negotiation of meaning that enhanced awareness and understanding often emerges. (O'Leary, 2014a, p. 118-119) Now, the introduction of this article referred to a paradox reproduced by mainstream approaches to the professionalisation of educations, task to which classroom observation represents a hallmark practice. The paradox lies in the fact that, in the quest for practical solutions to improve the quality of teaching, education policies promote professionalisation practices that end up squandering the creativity and innovation of teachers in the classroom (O'Leary, 2012, p. 807). 5 It was also suggested that such a situation is a reflection of a general belief among policymakers and practitioners about the lack of practical use of academic research in the field of education. The authors of this article suggest that the features of the CoC approach to classroom observation described above represent a valuable contribution to this debate. Broadly speaking, "Wragg's decision to categorise observation with reference to contexts rather than models also suggests that this is a more meaningful way of configuring it as it avoids the blurred boundaries between [PDMs]" (O'Leary, 2012, p. 805). More specifically, its potential lies in the way it elevates the scrutiny of teacher's agency as a fundamental category of analysis, without disregarding the production of quantitative data (of classroom episodes) that can be of practical use to discuss topics such as the use of class time.
The next section of the article presents a classroom observation proposal to encourage the reflexivity of educational agents to delve further into the meaning and the potential causes of classroom dynamics at schools. The centrality of the agency of educators in the analysis implies bringing academic rigour to the observation strategy (to study, for instance, the causes behind educational phenomena). Nonetheless, the ontological elements from the structure and agency debate outlined in section two suggest that a complete causal explanation of any social phenomena demands to address both the subjective (e.g., the agency of teachers) and objective (e.g., the properties of educational structures) dimensions of reality. Most likely, to succeed in the task of producing deep causal knowledge, researchers will have to contrast observational and conversational material with other types of sociological data related to the educational systems they seek to examine (Archer, 2013). 6 That second task exceeds the scope of the current study. However, the usefulness of non-standardized analysis, as a CoC-based strategy, resides in the way they can contribute in "[moving] policy-makers away from the reliance on input-output 5 In O'Sullivan's (2006, p. 252) experience, such a paradox is also a result of policy efforts that prioritise notions of standardisation under the sole rationale that they result easier to handle than those advocated to study educational challenges in different contexts. In her words "identifying and measuring teaching and learning indicators requires time to be spent in classrooms and is seen to be expensive. In comparison, input indicators are considered to be relatively easy to measure and make digestible for policy-makers and others". 6 Such an implication emerges from a realist ontology rearding the structure and agency debate in education policy research. According to Archer's (2013, p. 55) "[a]ctors react to the situations in which they find themselves; they may remain unaware of the factors which moulded such situations or of some of their properties. These socio-educational contexts must be investigated independently, and in doing so the sociological task is not just to record how they were viewed by people at the time but also to conceptualize how this broader context structured the actual situation in which each group found itself vis-à-vis education".
conceptualisations of quality toward a commitment to a context-focused teaching and learning process perspective" (O'Sullivan, 2006, p. 258). 7

A MULTISTAGE METHODOLOGY FOR COC
The following proposal for CoC builds explicitly on Wragg's (1999) understanding of ethogenics applied to classroom observation techniques. O'Leary's (2014b) classroom observation guide provides further methodological insights into how to design such a strategy, without leaving aside valuable elements from the cumulative corpus of knowledge (e.g., technical insights) in that disciplinary field. Details about the following steps also emerged from the experience of the researchers in observing classrooms in Northern Colombia (see next section). This clarification aims to convey the message of the importance of building problem-driven, and not solely method-driven, strategies to study social phenomena (Scott, 2005).
Generally speaking, much of the current strategy elaborates on O'Leary's (2014b) discussion on the work of scholars influenced by John Dewey's critical thinking. Critical reflection, Dewey contends, helps in making the transition "from being concerned with instructional techniques or […] the 'how to' questions, and to concentrate on the more important 'what' and 'why' questions" (O'Leary, 2014b, p. 111). The collaborative reflection process takes a multistage process as presented in three major stages. These are, "1) the event itself, i.e., an actual teaching episode; 2) recollection of the event, i.e., an account of what happened without explanation or evaluation; and 3) review and response to the event, i.e., processing at a 'deeper level'" (O'Leary, 2014b, p. 111). For the sake of clarity, the section now proceeds with the presentation of these steps as part of a general classroom observation strategy.

STAGE ONE: IDENTIFICATION OF CLASSROOM EVENTS (OR EPISODES)
One specific aim of this stage is to create a sequence of non-structured events to help researchers capture general classroom dynamics. Wragg's (1999) description of a static sampling answers quite well some basic requirements that meet this specific purpose. According to his textbook, Some observers build up a series of snapshots of a lesson, a little bit like a time-lapse series of photographs. This means that they code what is happening at some regular interval, perhaps at the end of every minute. Suppose the observers were studying four individual children, two boys and two girls, then at the end of each period there would be a record of exactly where each child was and the nature of their activity at precisely that moment. The advantages of this kind of sampling are that it allows twenty or thirty such snapshots to be collected in quite a short time, preserves the sequence of events, and permits analysis that is not too time-consuming. (Wragg, 1999, p. 34) One methodological proposal for this phase is for observers to fill in, during the observed session, a spreadsheet with basic descriptors of what happens (e.g., the teacher explains a topic, students pay attention) at regular time intervals (e.g., one description every five minutes). In this phase, it is also important to consider O'Leary's (2014b) comments on technical challenges to collecting non-structured data without disregarding basic reliability issues. One of them is the possibility of experiencing a Hawthorne effect, which means that teachers and students might fake their behaviour to make a good impression. A second relevant consideration is that observing one single classroom session might not be enough to identify the most typical classroom events. This may be for many internal (e.g., teachers deliver a rehearsed lesson) or external (e.g., a particular extracurricular activity shaping the content of that particular lesson) reasons. A CoC strategy can address such challenges by planning more than one observation session per classroom.
An additional element for consideration concerns the question of who should observe classrooms and record events. On the one hand, it is clear that observers might focus on describing different events happening at the same time (O'Leary, 2014b;Wragg, 1999). One immediate solution for this concern is to assign more than one observer per session and invite them to arrive at some basic agreements on the types of events they find important to report about during one specific lesson. Such a step entails introducing a first stage of group reflexivity about the meaning of classroom dynamics. On the other hand, it is legitimate to consider the level of expertise expected from those collecting observational data in schools. Within the PDMs' paradigm, this question is even more relevant, in the sense that observers should have certain characteristics (e.g., their level of expertise as teachers) that will make them stronger at evaluating the educators they observe. However, working with an ethogenic approach -which is less standardised -entails that "[e]ven classroom observers who have no intention of reading a single book about the topic would do well to recognise some of the precepts on which they are founded" (Wragg, 1999, p. 60). The empirical section of this document details the characteristics of the team that was recruited to observe schools in Northern Colombia and the criteria, as opposed to the expertise of the observers, which guided the exploratory study in this specific setting.

STAGE TWO: RECOLLECTION OF CLASSROOM EVENTS
Both quantitative and qualitative methods are important to generate valuable data on classroom dynamics. Put plainly, "[w]hile the counting of events may offer some interesting insights, it falls far short of telling the whole story of classroom life" (Wragg, 1999, p. 10). However, the first descriptive picture is valuable in itself as it represents a basis with which different educational actors can engage in a collaborative conversation about teaching and learning in secondary education (O'Leary, 2014b). Such a conversation, following Wragg (1999), can profit from a previous discussion between observers to prioritise the events that caught their attention the most and describe them further with other members of the observation team (e.g., with those observing other sessions). For instance, [t]he observer looks for specific instances of classroom behaviour which are judged to be illustrative of some salient aspect of the teacher's style or strategies: an element of class management, for example, perhaps a rule being established, followed, or being broken, something that reflects interpersonal relationships or some other indicative event.
[…] Critical events need not be spectacular. They are simply things that happen that seem to the observer to be of more interest than other events occurring at the same time, and therefore worth documenting in greater detail, usually because they tell a small but significant part of a larger story. (Wragg, 1999, p. 67) Team meetings can hence be useful in helping a first cross-check of the data, and the perceptions about it, among researchers, encouraging processes of group reflexivity about the meaning and the possible causes of observed classroom events (Wragg, 1999). This phase helps to provide more details about the way eventsas registered in phase one -take place. For example, while one can descriptively categorise one event as a case of "bad discipline", a wider description can illuminate the existence of forms of discipline (e.g., not paying attention, disrespecting the educator) that better represent the teacher-student relationship in a specific setting. If meetings are carried out in between observation sessions -given the importance, once again, of observing one classroom more than once -such reflections can also inform observers about particular elements that are potentially interesting in the next round of observations.

STAGE THREE: TOWARDS THE WHY QUESTION
In this last step, the researcher looks for further indications about the possible drivers that shape current classroom dynamics (as gathered during stages one and two of the process). These are only indications, given the epistemological arguments raised before on the need to complement descriptions and perceptions with other types of sociological and historical data. Along these lines, one straightforward methodological assertion indicates that qualitative analysis in this type of research, requires the classroom observer to understand and explain how teachers act, usually by first observing and then interviewing. Teaching is such a rapidly moving set of activities, that the way in which teachers, and for that matter pupils, see and interpret what happens, is often neglected. Observations and interviews allow the taken-for-granted to be explored in greater detail. (Wragg, 1999, p. 55) Nonetheless, according to O'Leary (2006, p. 193) "[i]n reality, little opportunity is given to teachers to […] participate in the observation process [actively]". This means that mainstream observation strategies rule out contextual information that is useful in the whole reflective process to identify potential policy changes to improve teaching and learning in schools. For this task, Wragg (1999) suggests that researchers holding such interviews (or conversations) should avoid using judgemental language to prevent misunderstandings with teachers and invite them to participate in a dialogue. It is advisable, hence, to "[use] neutral language like 'Can you tell me about…?', rather than loaded or leading questions, such as Why didn't you…?" (Wragg, 1999, p. 68). As it is discussed in the empirical section of this paper, these considerations are relevant in contexts where there are important levels of mistrust between educators and educational authorities.

SOME INSIGHTS FROM COC IN NORTHERN COLOMBIA
This section of the document presents some insights from the applications of the methodology to CoC in Northern Colombia. The empirical inquiry profited from a 10 th month (from August 2014 to June 2015) ethnographic research in three failing schools -from the perspective of the results of the students in national exams -from the Caribean region of the country. Within this general context, the focus of this study is to provide a replicable tool to observe classrooms in contexts in different settings. Given the sample size, the study is of an exploratory nature. However, as Colombian policymakers embrace the assumption that "[t]eachers are the most important determinant in the learning of students" (García et al., 2014, p. 89, translated quotation)", the case study raises relevant elements to debate about the tensions emerging from the application of acquisition models of learning at schools.

SITUATING THE EMPIRICAL INQUIRY
Studying schools in Colombia entails entering a contested terrain. For example, there is a dominant narrative among teaching unions on the need to become "more protective in light of an increasingly market-oriented environment [in education]" (Gindin and Finger, 2013, p. 25;Rodríguez, 2015). Contrary to the optimism expressed by scholars and policymakers around the efficiency in the sector sponsored by law n. 715 from 2001 (Barrera-Osorio, Maldonado and , educators in schools -most of which are members of Atlánticos Teaching Association (ADEA) -feel threatened by the notion of assessment as set out in that law (Parra, 2017). 8 One consistently reproduced idea among members at various union levels (from its executives to teachers in schools) is that evaluations lack a clear intention to provide valuable feedback for teachers' professional development. On the contrary, they see evaluations as a managerial device to limit teachers' chances of climbing in their pay scale -for references on similar experiences internationally see Ball (2003) or Roberts-Homes (2015).
The narrative of bad teaching in the country commonly appeals to the lack of good incentives for good teaching, including underpayment -teachers in Colombia have one of the lowest Civil service salary scales -and the lack of effective performance monitoring. Currently, decrees n. 2277 from 1979 and n. 1278 from 2001 regulate the teaching profession in the country, each decree grouping educators according to the date in which they initiated their teaching career (Cifuentes, 2014).
Youngest teachers, most of whom fall under decree n. 1278, need to pass national exams to progress in their pay level, while older teachers do not. Because of this, many voices point out that increasing the quality of learning will be possible after a generational transition, once all teachers fall under decree n. 1278, as then all teachers will have better incentives to continue strengthening their teaching skills. Much less attention is given to the role of students and their households in the whole process -some notable exceptions are the reflections prompted by Palacios (2013) and Cajiao (2014). This may partly be because mainstream work on school effectiveness in the country assumes that the background of families is beyond the possibilities and explicit responsibilities of the Ministry of Education (Parra, 2018).
Regarding the specific debate on classroom observation methodologies, it is fairly easy to trace the way in which education policies in the country reproduce many aspects of the PDMs' paradigm. For instance, one of the biggest programmes from the Ministry of Education to promote the professionalisation of teachers (the programme "Todos a Aprender") sees "educators as the main bet to improve the practices in the classroom" (Díaz, Barreira and Pinheiro, 2015 p. 55, translated quotation). Even though the programme has different components, including some strategies to support the improvement of the infrastructure of schools and the strengthening of their managerial practices, the observation method bulks on the measurement of standards in the classroom. 9 Epistemologically speaking, such a situation implies a challenge for the country to migrate to different observation methodologies to transcend the weaknesses of PDMs. Table 1 summarises the characteristics of the sample of observed classrooms in the three schools in this study. While the objective is not to establish statistical comparisons, the differences between some of the variables show some of the complexities of the Colombian education system described above. For instance, from the nine observed classrooms, five teachers fall under decree n. 2277, with the remainder falling under decree n. 1278 from 2001. Observed subjects included Mathematics, English and Spanish, all considered as core subject areas in school curricula in the country. The decision to work with senior students is based on the assumption that most of them had lived their entire school experience in the same institution and, hence, had (most likely) normalised their relationships with their teachers and peers. However, the project ended up covering 10 th graders (instead of 11 th graders), due to earlier warnings from teachers and school directors that senior student classrooms devote most of their time to training students to take the standardised tests.
As for the conflicting relationships between teachers and the government, some methodological and ethical considerations helped the research team to enter the classrooms. For instance, as this observation project forms part of a broader ethnographic study in the region (around ten months, in total), observations took place only in the last phase of the fieldwork (between May and June of 2015). In previous visits to schools, researchers worked on building trusting relationships with school community members. Likewise, the participation of teachers in the project was voluntary and involved preliminary discussions with each of them during which researchers shared a short document containing all the details of the observation process (e.g., methodology, observation templates, objectives). That same document was presented to the heads of ADEA in the city of Barranquilla (the capital city of the Department of Atlántico).
Finally, all classroom observers were undergraduates from a Normal School from one of the municipalities. Normal schools operate like primary and secondary educational institutions that offer two additional years of vocational training for students who want to be certified as teachers at basic and medium levels of education. The observation team was comprised of four female and two male students in the last years of their training to become professional teachers. The lack of gender balance in the observation team reflects the feminised status of the teaching profession in the country, as there were more female candidates available to make up the team. The entire recruitment process of the observation team took place with the support of the managers of the Normal School vocational training programme, and it was agreed that teacher training students participating in the project would get credits for some of the practice hour requirements needed to get their professional degree.

Stage one
In the empirical exercise, the research team was split into pairs, and each pair had the responsibility to observe the same subject in the three schools. Hence, one pair observed Maths lessons in all the three schools; another pair had to observe only Spanish lessons and so on. To create a sequence of non-structured events, each observer had a spreadsheet with spaces to fill out with general descriptions of what was happening in the classroom with time spans of five minutes. As each lesson was programmed to take 50 minutes, the expectation was to collect around ten descriptions of events per session. The instruction for observers was to meet after each session to compare their notes to generate one single form of events per session. Basic descriptors of the whole observation process confirmed Posada's (2009) notes on his lesson observations in schools from the region, stating that the time devoted to class is insufficient and inefficient. For example, the average class session lasted 39 minutes (78% of the ideal class time), 29% of the lessons started late, 15% ended before time, and 26% were interrupted by external factors (e.g., announcements from the coordination office). Figure 1 and Figure 2 below show in detail the distribution of the 212 descriptions of classroom events in the different schools per three observed subjects. The initial coding frame to classify these events was taken from Schoenfeld's (2014) rubric to observe Maths lessons. The resulting categories stemmed from an iterative exercise that consisted of testing and refining such codes until their saturation (Schreier, 2012). For the sake of clarity, the information was divided into teachers' events and students' events.
Both graphs reveal some patterns. For instance, most teachers spend their time either lecturing or giving instructions (Figure 1). Particularly in the case of maths, teachers also spend an important part of their time disciplining students. From the perspective of students (Figure 2), two categories that stand out are bad behaviour and disrespect to the teacher, both of which refer to a negative attitude expressed by Notes: Categories preceded by a denial (not) describe situations in which there was an event to which the teacher did not react. For example, not-disciplining describes an event in which students were misbehaving, and the teacher did nothing about it. Source: Research database. Elaboration of the authors. students during class time. Students also showed signs of active participation which are evidence of some level of their engagement during the class. Together, both graphs show relevant relational dynamics for this study. For instance, expressions of good behaviour seem to go together with the use of class dynamics by teachers, and with teachers making reference to the content of the last session or devoting time to giving instructions on class contents and assessments. Then again, disrespectful attitudes towards the teachers tend to be more present in settings where teachers do not react by disciplining students. At the same time, lecturing time seems to discourage the active participation of students, a category that shows a positive relationship with higher frequencies of events representing teachers answering questions.

Stage two
This stage aims to identify greater descriptors of class dynamics, emphasising the most frequent class events. Table 2 provides a general description of the results of this stage, now informed by a content analysis (Boyatzis, 1998). The number of nodes, in this case, means the number of quotes (or descriptors) that saturated each category according to the transcripts of team conversations (using Nvivo). In the rest of this document, all phrases cited as quotes are excerpts from conversations and interviews.
Observers were mostly interested in discussing teaching methods (how teachers deliver information). One category that seems related to effective learning is the way in which teachers encourage, or not, participation. Observers provided Note: Categories preceded by a denial (not), describe situations in which there was an ongoing issue and yet the student did not react. For example, not active participation means that there was a class event in which teachers asked a question to the class that students did not answer. There is also a distinction between classroom events reflecting bad behaviour and disrespect to the teacher. In the coding frame, the former accounts for episodes in which students were exhibiting aggressive behaviour towards their peers. The category of good behaviour captures moments in which the observer expressed that "students are well-behaved". Source: Research database. Elaboration of the authors. some descriptions about this issue which exhibit tendencies towards restricting students' participation. However, they also found exceptions to this dynamic, as illustrated by the next quote, included here, because it conveys a notion of good practice according to educators (the observers themselves) in the region: [the teacher] started to ask students about their memory of what an argumentative text is and told them to break the concept into its two [compositional] words: "text" and "argumentative", to make a case. She gave examples from day-to-day life in the classroom as well as some related to the day-to-day lives of the students. By these means, students were participating during almost the entire class and asked her questions, and she asked questions back […]. It was a little more dynamic because they were all interacting with each other.
However, this is not representative of most classrooms. For instance, in some of them, "male students [participate] little and have no leadership", while in others, "female students do not participate […] they were there staring elsewhere […] making gestures of boredom". In fact, it is more frequent to see limited student engagement in classroom activities, including, for example, a reluctance to submit homework or to answer questions asked by teachers. One of the main challenges related to this is the predominant problem of poor discipline, which undermines teaching-learning dynamics. From discussions with observers, disciplinary issues often emerge as a result of students exhibiting anxiety to leave the room or as a reaction to particular situations that they use to disrupt order: The teacher was trying to gain back control, but sincerely, in that case, it was almost impossible. It was the last hour […] Students said that it was hot, they were hungry, sleepy [and] were worried about lunch.
There was a lot of disorder at the moment in which students were paying 200 pesos 10 [for copies of material], Students jumped around. From these quotes, it is evident that the physical or environmental conditions can be factors of disruption. For instance, students are hungry or feel uncomfortable with the conditions of the classroom (e.g., the lack of ventilation). Both situations seem to exceed the possibilities of teachers to keep students engaged. Also, the situation of a teacher collecting money for copies shows how the school failing to supply materials can have a negative impact on class dynamics (e.g., through distractions that lead to disorder). Another relevant element linked to this discussion is the way teachers discipline their students, and how opportune they are. Here observers recorded levels of heterogeneity, from teachers that do nothing in reaction to student's negative attitudes, to some that are more skilled in regaining control: He did not run into any complications […] he simply told them that he still had to decide on their final grade […] [the] students walked around the room and stayed still at their desks.
Things do not go bad because immediately after the teacher corrects them and if [they look for] a way to sabotage [the class] she scolds them.
The strategies used by educators to persuade students to modify their attitudes deserve further consideration. Some teachers ignore students. Some teachers threaten students with bad grades. During one conversation with observers, one group pointed out that one educator reminded students how bad attitudes are reflected in unsatisfactory results in standardised exams: "[she] told them that the low performance in ICFES [standarised test scores] was […] due to […] indiscipline". In contrast, in one specific case, lesson observers reported constant levels of good behaviour and student engagement. Here they emphasised the teacher's skills in keeping the students engaged by keeping them busy (e.g., lots of group activities) and by reacting promptly "to stop them if they tried to sabotage the class".
One last element that is worth mentioning is the great number of class interruptions occurring in these particular schools. Lesson observers noted, for example, the occurrence of extracurricular activities during class time (e.g., a soccer tournament) or one particular case in which the teacher stayed in the teaching lounge while she was supposed to be in the classroom. In one school, observers noticed how the number of early school meetings of the whole school community on the football pitch negatively affected class time. These meetings occurred weekly, meaning that children lost at least half an hour of a particular class every week.

Stage three: towards the why questions
In this last stage, the instruction for all the observation team was to identify, for each subject, the three events that better summarised what they saw in each classroom and to conduct interviews with teachers to explore those specific observations with them. The questions posed by the observers introduce interesting elements for reflection. The observation teams were instructed to identify the type of class events that most caught attention. An evident focus of their concern is discipline in classrooms and the need to encourage class dynamics and participation. It is also worth mentioning that four of the nine interviewed teachers received an explicit mention from the observers about their good teaching practices. However, in these four cases, the interviewers also referred to the behaviour of students and how teachers need to do something about it. In the sole case where the educator received only positive comments on her skills in maintaining control over her group, she did not hesitate to point out that the group observed was one in which she had already built a long-standing relationship after working with them for many years. That was her way of clarifying that this was not the norm.
So why do children show certain attitudes and behaviours? Most of the educators referred to schools' specific constraints, such as the lack of proper learning spaces, the lack of texts or books and the non-suitability of the school infrastructure for region's specific warm weather. The following two interview excerpts exemplify these situations and highlight relevant elements related to them. For instance, problems in this regard do not only mean a lack of pedagogical teaching resources but also appear to affect students' willingness to study because they feel uncomfortable in the classroom: Observation team (OT): Is it frequent for the class to finish before due time?
Teacher (T): In the last hours of the day we have to do this because sometimes they feel overwhelmed, they become desperate, but during the first hours they do wait for the bell […] the last hours in school are a little traumatic. First, we see the heat […] Ventilation is not completely working […] They should be having their breakfast around half past six [in the morning].
Moreover, during the observations, session observers mentioned some episodes in which the teacher had to collect money from students to pay for copies of written material. One of the teachers explained that such money comes from teachers' pockets, and not from the school's budget.
Observation team (OT): For us, the students, [English as a second language] is not a priority subject. However […] one of the recommendations we [woud like to share with] you, is to use more didactic materials and more ludic activities to wake up their motivation and participation.
Teacher (T): the thing is that we lack didactical material, so we have to [buy it] and sometimes this is from our own pockets.
Interviews also revealed that some strategies used by schools to prevent internal problems create other challenges. Consider, for instance, the following response from one of the interviewed teachers, when asked about discipline in his classroom: Observation team (OT): Another thing that caught our attention was the behaviour of students.
Teacher (T): Well, hmm […] I think that I have already commented at one point that you chose one of the hardest groups we have in 10 th grade.
That was not the first time that teachers referred to groups as being problematic as compared to other groups. In informal conversations with school coordinators, they explained to the research team that students are split into groups from the first year of secondary school according to criteria such as their age. Nonetheless, apparently, some groups of students systematically lag behind, partly because they become stigmatised as the children that are harder to teach.
Educators also mentioned issues that we will typify as external to the school. For instance, the lack of support from parents and the general socio-economic conditions of families in shaping attitudes in students (considering that many of these children live with relatives other than their parents): Observation team (OT): Why do you think that the students show such a lack of interest in the English class?
Teacher (T): About that, we need to be very clear in mentioning that the problem that affects students in all senses today is the actual education they receive from their parents. There we find apathy and the [origins of ] the lack of respect. We no longer have students with real values In the last part of the interview, observers were instructed to ask specifically about the teachers' perceptions of current education policies, and the way these might affect observed classroom dynamics. The following quotes summarise the most recurring views on this issue: There are a lot of positive things, for instance, in the area of English language we have received support from the government […]. The students themselves show themselves to be reciprocally motivated towards the English language, but, for instance, I have a class with 40 motivated students, and I think that such a number is too high to work in the way it should be done. These quotes reinforce previous arguments given by teachers -such as the shift in values among generations -and introduces new elements into the discussion. One of these is the duality between access and quality, which is a common argument in policy discussions. A second element corresponds to the no children left behind culture 12 and the way it diminishes the incentives to study hard -according to educators. In the third place, educators refer to the way in which other government initiatives to counteract poverty -such as conditioned cash transfers -has modified parent-teacher relations. In the past, educators would argue, parents respected more the authority and the role of teachers. Today, teachers note, parents have other incentives such as having access to state subsidies.

FINAL REMARKS: OPENING THE SCOPE OF EDUCATION POLICY DEBATES
This article provides a brief overview of classroom observation techniques as a hallmark in education quality debates. The discussion in the first sections highlights a paradox surrounding the implementation of mainstream observation techniques in the context of the professional development of educators; in the eagerness of promoting more effective teaching practices at schools, policymakers and practitioners end sponsoring standardised tools that contribute to a decline in the creativity and innovation of teachers in classrooms. The paper aims to make progress in the design of classroom observation tools that contribute better to real improvements in teaching and learning practices at schools. To achieve such a goal, the structure and agency debate applied to educational research suggests that classroom observation techniques should place an important focus on the reflexivity of educational actors.
The implementation of a new proposal to CoC helped researchers to become aware of factors that potentially shape teacher-student interactions in the observed classrooms. Firstly, descriptions of events and the team's reflections about them (in stages one and two) indicate that the lack of discipline and proper behaviour and the limited engagement of students in the class are two prevailing features of classroom dynamics in these schools. The levels of heterogeneity in teacher's engagement and skills to encourage students indicates, however, that not all teachers are poorly prepared, as mainstream school effectiveness research suggests. Some teachers do their best with the scarce resources they have. Some others express their frustration and hence do little to 12 It is important to mention, however, that decree n. 1290 from 2009 modified decree n. 230 from 2002, which etablished an upper cohort of 5% regarding failing students. The new legislation does not set a specific standard, but gives authonomy to schools to define their evaluation criteria. become more active educators. Schools have initiatives to deal with problematic students, such as grouping students by age (a variable that is often correlated with levels of engagement of students), which sometimes ends generating stigmas (e.g., the problem child) that can negatively affect the learning process of specific students. Secondly, combining sources of information allows building hypotheses about the potential structural forces acting upon teachers and students at the research site. Broadly speaking, the poor engagement of students is related to the conflict between the educational standards, the structure of families, and the lack of opportunities they see in a good education (quoting one teacher, "some are stagnated at home and do not build a career"). Such argument also finds support in the fact that families seem to be more enthusiastic about the compensation for sending their children to school rather than about actually fostering their learning processes (by establishing, for example, good relationships with teachers). The previous situations and conditions, added to the lack of basic requirements (e.g., proper infrastructure) are important barriers for teachers to exercise their agency as educators.
What is more interesting here is, however, acknowledging the potential of a simple, but ontologically well equipped, tool, as the CoC, in providing valuable knowledge for researcher and policymakers about the real challenges of education policy in a country or society. One advantage of this methodology is that can still provide valuable information to policymakers to keep records of class dynamics (something that the NeoLiberal education agency demands them -Ball and Olmedo, 2013), but without narrowing the scope excessively deterring the study of the structural features of education policies at work. Likewise, while it is clear that a CoC is ontologically bounded to provide the whole causal story of why schools fail, such a tool can importantly contribute to help policymakers, practitioners to "break free from the assessment straitjacket that currently constrains" (O'Leary, 2014b, p. 220) the possibilities to identify new effective ways to transform educational systems in benefit of teachers and students.