The use of cognitive instruments for research in early childhood education: constraints and possibilities in the Brazilian

: This paper discusses the adaptation of the iPIPS assessment for use in research with Brazilian children between the ages of four and seven. It debates the importance of having a baseline measure to assess early childhood education policy as well as the advantages of collecting high-quality information about children ’ s development. Not knowing how children are progressing could harm disadvantaged children and increase school inequality. The data used in the analysis was based on the pre-test of the iPIPS 2016 for mathematics and language in a total of 560 cases collected in three Brazilian cities. The preliminary analyses indicate that the items of both tests present adequate behavior, suggesting the theoretical adequacy of the items and a good adaptation and application protocol. the pre-test with the instrument. of application


Introduction
It is widely recognized that children's early development and their progress during the first years of school are crucial for their later success (Peisner-Feinberg et al., 2001;Sammons et al., 2008;Sylva, Melhuish, & Sammons, 2010;Sylva et al., 2006;Tymms, Jones, Albone, & Henderson, 2009). Then, measuring and monitoring children's evolution in this key stage of their life should be a policymakers' concern. Evidence to guide educational policy is important and this begins with understanding what children know and can do when they start school in their own country. This understanding sets out a Learning Path to inform curriculum e-ISSN 1980-6248 http://dx.doi.org/10.1590/1980-6248-2018-0036 Pro-Posições | Campinas, SP | V. 31| e20180036 | 2020 3/24 development and framework against which the impact of interventions may be evaluated. An understanding of how the development of children in one country is comparable to other countries at the start of school life can provide policymakers with information to help them evaluate the effectiveness of both preschool and educational policies.
In order to gain an understanding of young children's development, it is important to have high-quality instruments with the capability of measuring children's baseline at the start of school and their continuous learning during the first years in school. Another focus, as alluded to in the previous paragraph, can be a comparable measure across countries. In this sense, the iPIPS instrument (International Performance Indicators in Primary Schools; www.ipips.org) is an international assessment of children starting school which can also be repeated at the end of the first school year. iPIPS has the potential to fill an important gap in providing comparative information about children's progress in the first year of school around the world. This paper discusses the adaptation of the iPIPS assessment for use with Brazilian children between the ages of four and seven. It debates the importance of having a baseline measure to assess early childhood education policy as well as the advantages of collecting highquality information about children's development and skills on entry to school with the ability to predict later literacy and mathematics outcomes (Jordan, Glutting, & Ramineni, 2010;Schneider et al., 2017;Tymms, Merrell, & Henderson, 1997;Tymms, Merrell, & Jones, 2004).
In Brazil, in the wake of the inclusion of early childhood education in the system of basic education, proclaimed in the 1988 Constitution (Brasil, 1988)  However, what has been observed is an expansion without suitable planning. This situation can generate low-quality services, including institutions maintained precariously, which e-ISSN 1980-6248 http://dx.doi.org/10.1590/1980-6248-2018-0036 Pro-Posições | Campinas, SP | V. 31| e20180036 | 2020 4/24 can be harmful to child development (Campos, 2010;Rosemberg, 1999aRosemberg, , 1999b 5 . In Brazil, the monitoring of education's quality is mandated by law on the Plano Nacional de Educação -PNE [National Education Plan]. However, in what concerns early childhood education, the National Education Plan only sets targets related to access to early childhood education (Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira, 2014) and the proposal for the National Assessment of Early Childhood Education is still in elaboration. Moreover, there is a lack of studies to evidence the requirements of the law or to observe school characteristics and policies associated to child development. At the time of writing, there are no high-quality instruments in widespread use across the country to assess child development in preschool and, therefore, it is not possible to measure the impact of education policies on children's educational trajectory.
This paper is organized in three parts. The first part discusses evidences of early childhood education impact on educational trajectories. The second debates the importance of having a baseline and presents the characteristics and potential of iPIPS to provide information for policy evaluation and international comparisons. The third describes the steps to adapt iPIPS assessment scale to Portuguese (language and mathematics) and presents the main findings of the pre-test conducted in 2016 with the adapted instrument. A total of 560 children were assessed in three Brazilian cities in public and private schools. Preliminary analyses indicate that the items of both tests (language and mathematics) are psychometrically sound, suggesting the theoretical congruence of the items and a good adaptation and application protocol.

Assessing the impact of preschool: possibilities, constraints and evidence from the Brazilian context
Children's cognitive attainment and lifelong outcomes of social mobility have long been assumed to be the result of school enrolment and regular attendance. Early years education and readiness to start school have positive effects on a range of child outcomes that contribute to e-ISSN 1980-6248 http://dx.doi.org/10.1590/1980-6248-2018-0036 Pro-Posições | Campinas, SP | V. 31| e20180036 | 2020 5/24 life-long events. Studies conducted in diverse contexts indicate that attending preschool is an effective policy in promoting greater equality of educational opportunities. It contributes to (future) learning and a more fluid, more long-lasting school trajectory, especially among deprived children-from low socio-economic backgrounds (Campbell, Pungello, Miller-Johnson, Burchinal, & Ramey, 2001;Peisner-Feinberg et al., 2001;Sammons et al., 2008;Schweinhart, 2002;Sylva et al., 2006). This section discusses the current potentiality and drawbacks of the large-scale assessment systems available in the Brazilian context to develop an analysis of this kind.
International longitudinal studies confirm that children that have had the opportunity to attend good-quality early childhood education environments present better cognitive and social skills during their elementary school. Two studies in particular present more robust evidence. The first was conducted in the USA (Peisner-Feinberg et al., 2000), and the second in the UK (Sammons et al., 2008;Sylva et al., 2003;Sylva et al, 2006;Sylva et al., 2010). The investigations estimated the impact of early childhood education school quality (measured on the ECERS-R scale -Early Childhood Environment Rating Scale: Revised Edition and on the ECERS-E -Early Childhood Rating Scale -Extension) on children's school trajectories.
Heckman and colleagues argue that the so-called "cognitive" and "non-cognitive" skills developed in early interventions in children's education are not only important for their development later in life, but also that these initial investments are significant for subsequent development (Heckman, 2000(Heckman, , 2008. Cunha, Heckman, Lochner, and Masterov (2006), state that the mastery of skills at a certain stage of the life cycle allows greater skill development in later phases (a phenomenon called "self-productivity"). Moreover, according to the authors, there is a complementarity of the investments; early investments facilitate the productivity of subsequent investments, that is, the former expand the latter's effectiveness.
Both of the aforementioned studies provide strong evidence to suggest that: (i) attending preschool increases chances of future success for students in the schooling process; (ii) attending a high-quality preschool increases subsequent learning gains (when compared to the conditions of not attending preschool at all or of attending one of inferior quality), with a special highlight on deprived children (poor families and/or of a low socio-economic level).
Another study, conducted in 2010 in three Brazilian state capitals, investigated the impact of preschool quality (measured on the ECERS-R scale) on language development measured in the second year of primary school (Brazilian Standardized Test "Provinha Brasil"). The data indicated that children who attend good-quality institutions presented better performance in the test than children who had not frequented preschool at all or had gone to substandard institutions (Campos et al., 2011). Nevertheless, the few studies carried out in the Brazilian context presented limits to tracing causal relations between preschool quality and students' learning or their future trajectories. Most of them were not longitudinal, did not measure children's developmental levels at the start of preschool, and only produced measures about school inputs, such as infrastructure and pedagogical materials and/or of students' later outcome. Without a baseline measure, it was not possible to assess the progress made by children between the start of preschool and later outcomes.
iPIPS Brazil can make a significant contribution to policymakers and practitioners by associating measures about children's development (cognitive and motor skills) with preschool/primary education policies and teachers' practice. Such a contribution becomes more pressing in the face of the almost complete universalization of attendance in elementary school and the shift in the discussion about access to school for effective learning or teaching quality provided by schools (Bonamino & Oliveira, 2013).
In the last 20 years, Brazil has created broad educational assessment systems such as the Nevertheless, the data are cross-sectional and insufficient to draw causal inferences regarding the impact of educational policies and school practices or school factors that influence pupils' learning. Franco (2001) argues that SAEB data has limitations in this respect, as proficiency is a measurement at only one moment, and, therefore, does not express the children's learning over the years (or even at a particular school stage). Longitudinal studies are considered the "gold standard", since the measurement of previous proficiency is used as a control, thus enabling better estimation of the effects of school in its multiple facets.
In Brazil, studies with more robust designs that allow estimation of the effects of specific GERES created specific tests in mathematics and language, equalized with the SAEB scale.
Nevertheless, as the study made its first data collection in the 1st grade of elementary school, the design did not allow making inferences regarding the impacts of early childhood education and the then so-called literacy class on the trajectory of students' learning.
The iPIPS: structure and potential use PIPS, Performance Indicators in Primary Schools, has been developed and used for more than 20 years in the UK and other countries (Tymms, 1999a). The aim was to create a baseline measurement for use with children at the start of formal schooling. In England, where PIPS was first developed, this meant that the assessment was used with children aged 4-5 (the aim of providing information to help teachers target their practice effectively. In 2014, around 30% of the English state schools voluntarily used the assessment as a diagnostic pedagogic tool. The PIPS assessment is designed, amongst other things, to track children's progress in reading and mathematics. It was originally created to act as a baseline for children starting school in England so that progress could be assessed, at a later stage, using value-added models (Tymms, 1999a). The development work therefore focused on the best predictors of success in literacy and numeracy. There was no shortage of information on the early indicators of literacy, but there were gaps in the literature on early indicators of numeracy, and work was devoted to filling this gap (Tymms 1999b).
The project evolved and, at teachers' instigation, the children were reassessed after just one year at school and the assessment was extended to include reading and numeracy per se; not simply the predictors. As the project expanded internationally, adjustments were made for each country, but the essence of the assessment remained, allowing the progress of children on their pathways towards literacy and numeracy to be tracked over time.
Initially called PIPS, On-Entry Baseline Assessment (Tymms et al., 2004), the instrument has the aim of providing schools with high quality data on student's progress in the first stages of their schooling. The data collected is processed and reports are generated and delivered individually to the teachers, concerning the conditions and progress of a class/group. They are presented at the individual level and contextualized with the advance of the class/group and their counterparts in other schools with similar characteristics (Tymms & Albone, 2002). The application of the assessment is one-to-one and can be conducted by the teacher of the class/group or by another adult; the duration varies from 10 to 20 minutes. The computer program presents the questions orally, and, depending on the type of question, the child responds by pointing at the screen or by saying the answer. The teacher records the answer directly on the computer screen, and the program selects the next question. The program is adjusted to the child's responses. For each section, when there are three wrong answers in a row or four wrong altogether the program moves to the next section. Each section of the test presents items of progressive difficulty, which makes it possible to have a test with a minimal duration so that the child does not become bored with questions that are too simple or too difficult (Tymms et al., 2004).
iPIPS was derived from PIPS, with the addition of the term "International" to the title of the project 6 . Inspired by the success of the measurement program as a pedagogic tool, its providers, the Centre for Evaluation and Monitoring (CEM) at Durham University proposed its adoption in international comparative research.
At the present stage, iPIPS is being applied to children aged 4-7 in nine countries, having already generated several scientific works that attest to its technical quality and predictive power.
The English version of PIPS has a high reliability level in the test/retest of 0.98 (CEM Centre, 1999) and good predictive power for reading and mathematics. Using a base measurement with children aged 4 or 5, the tests presented a correlation of 0.71 for reading three years after the first measurement, and a correlation of 0.7 after seven years (in England, the children were aged e-ISSN 1980-6248 http://dx.doi. org/10.1590/1980-6248-2018-0036 Pro-Posições | Campinas, SP | V. 31| e20180036 | 2020 10/24 11) (Tymms, 1999b;Tymms et al., 1997;Tymms et al., 2004). These are high values compared with those obtained in the other tests available, which serves to reinforce PIPS relevance for use in school effectiveness studies in longitudinal or experimental designs.

Performance Indicators in Primary Schools (PIPS) assessment: adaptation and pertinence for the Brazilian context
The PIPS assessment for language and mathematics was adapted and pre-tested in Brazil, so that it could be used in longitudinal studies. The study aims to follow children for three years, from preschool up to the first year of primary school (age range 4-7), in order to identify the characteristics of the institutions (school inputs, organization of the provision) and pedagogic processes associated to students' learning, in particular literacy and numeracy. The The pre-test was applied in a total of 560 children in three cities: Rio de Janeiro, Juiz de Fora, and Petrolina. Table 1 presents the distribution of approaches taken by the researchers. 7 Besides the aforementioned instruments, the longitudinal study also collected contextual data on the students (socio-demographic characteristics) and the schools-including both the characteristics of the provision (school inputs, organization of the provision, teacher training) and the pedagogic processes, focusing on those related with literacy and numeracy (lesson planning and organization, internal organization of groups/classes, classroom atmosphere, interaction between families and schools).  In the pre-test, the PIPS instruments were applied to children aged 4-7, in groups with a quite heterogeneous home background profile. The main aim was to test the suitability of the assessment and spot possible limitations in translation or in the application protocol. Three main analyses were made with the data collected in the pre-test using Rasch measurement (Bond & Fox, 2015;Boone, 2006)   The preliminary analyses indicate that the items of both tests present adequate behavior, suggesting the theoretical adequacy of the items and a good adaptation and application protocol.
Analyzing only mathematics items, Person Reliability is .92 and Item Reliability .97. The map of the items generated by Winstep suggests that the PIPS scale is adequate for Brazilian students aged 4-7.
The distribution of the data (children) for each group is very close to a normal curve and the items are well distributed throughout the scale. More importantly, the level of difficulty of the different sections corresponds to the theoretical presupposition used in devising the test.
For example, items about identification of letters are, on average, easier to get right than reading words or short sentences. The same was observed regarding language: items involving identifying letters were easier compared with items that demanded reading words or passages.
The ladder (Tymms, Howie, Merrell, Combrinck, & Copping, 2017) in Table 2 presents the overview of the mathematics results and its pedagogical interpretation for the three grades (first and second years of preschool and first year of primary education) evaluated in 13 schools during the pre-test: As expected, Table 2 shows a different distribution of students enrolled at the first and second years of preschool and first year of primary school along the rungs of the ladder. For example, at the end of the first year of preschool, most children evaluated were at the informal arithmetic level. They were able to identify single-digit numbers and were able to do simple informal sums. However, most children evaluated at the end of the first year of elementary education were at the simple formal arithmetic level. They had developed abilities such as identifying two-digit numbers and doing harder informal sums and simple formal sums. Small isolated problems in the Application Booklet (support material offered to the students at the moment of the test) and in the application protocol were identified by the team and were corrected before the application of the first wave of the longitudinal study in March 2017.
The final analyses were undertaken using the data collected in the city of Rio de Janeiro, especially for pupils enrolled on the first grade of primary school. For this particular age group, we had further data available such as scores in SME-RJ standardized tests and family background information/variables. The PIPS test was applied in six primary schools and in each school only one classroom was assessed. The variables used in the analyses are described in Chart 1 and Table 3

Dummy
Indicates that at least one of the parents completed secondary education (1=yes/0=no)

Higher Education
Dummy Indicates that at least one of the parents completed higher education (1=yes/0=no) Source: LaPOpe UFRJ First, we note the correlation between the pupils' language and mathematics scores in the PIPS and the SME-RJ tests:  Table 4 shows a strong correlation between pupils' language and mathematics scores using PIPS test (r=0,719). Considering SME standardized tests, we have found a slightly weaker correlation (r=0,597).
The analyses also included linear regression models estimating two outcomes: pupils' individual mathematics and language scores for PIPS and the SME-RJ standardized test using contextual information about the families as independent variables (information obtained in the   The linear regression model, with no school fixed effects, suggests that 27,9% and 26,1% of the variance of the language and mathematics scores using the PIPS test is explained by the background variables. The analyses performed with standardized tests applied by the SME-RJ (Alfabetiza Rio -1st grade of Primary School) to the same students indicate that the same contextual variables explain 8,9% and 16,6% of the variance of the language and mathematics scores, respectively. The results strengthen the hypothesis that the measurements generated by PIPS are of higher quality at the individual level and correspond to the results for Brazilian students and to other international research suggesting that the socio-economic level explains, on average, 30% of the variance in the standardized tests in the USA (Sirin, 2005).

Conclusion
Measuring and monitoring children's development is a key aspect of assessing the impact of educational policy and producing reliable measures about the quality of the educational system and outcomes. In Brazil, as in other countries, it is possible to observe resistance from some educational researchers regarding the potential harm that measuring children could produce. We believe that we need to develop measures in order to enhance knowledge about what children know and can do when they start school and how much progress is made in the first three years in school. A longitudinal design allows researchers to identify the most effective approaches to help all children, including those from disadvantaged backgrounds. Not knowing how children are progressing could harm poor and disadvantaged children and increase school inequality.
We have argued for the importance of having early baseline measures for children development in order to construct robust research designs that allow causal inference-specially as the existing large-scale assessment instruments in Brazil are not adequate for such studies due to their cross-sectional nature and the absence of a baseline measure. Therefore, iPIPS can be a first step for the use of longitudinal studies with this purpose as the pre-test results showed its pertinence for the Brazilian context.
The preliminary analyses indicate that the items of both tests, mathematics and language, present adequate behavior, suggesting the theoretical adequacy of the items and a good adaptation and application protocol. Analyzing only mathematics items, Person Reliability is .92 and Item Reliability is .97. The map of the items generated by Winstep suggests that the PIPS scale is adequate for Brazilian students aged 4-7.
Measurements generated by PIPS are of higher quality at the individual level and correspond to the results for Brazilian students and to other international research suggesting that the socio-economic level explains, on average, 30% of the variance in the standardized tests in the USA (Sirin, 2005). The distribution of the data (children) for each group is very close to a normal curve and the items are well distributed throughout the scale. More importantly, the level of difficulty of the different sections corresponds to the theoretical presupposition used in devising the test. These kinds of studies can provide information for taking decisions about future educational policies for the pre-school and the first year of primary school. We consider that there is an irreversible trend, both in Brazil and in the rest of the world, towards the adoption of systematic information, collected with growing rigor and precision, as the basis for educational policy decision-making-the so-called evidence-based policies (Dagenais et al., 2012;Miller & Pasley, 2012;Nutley, Morton, Jung, & Boaz, 2010).