Meta-evaluation of Baseline Studies of the Brazilian Family Health Strategy Expansion Project: a Participatory and Formative Approach

A participatory, formative meta-evaluation of baseline studies in Brazil is presented. International standards recommended by associations of evaluators were used, along with " specificity " criteria built up using the terms of reference for proposals for the selection of studies. The meth-odological approach combined a " peer review " of baseline study reports, with a participatory (self) assessment for " primary " evaluators, the average of which provided the final score. Results revealed a classification of " good " and " very good " for the set of standards. The differences between the attri-bution of scores further highlight the importance of taking into account multiple points of view. Given the lack of pre-existing standards for the reports, the absence of standards and the incipient nature of evaluation focusing on utility, this meta-evaluation does not adequately reflect the quality or potential utility of the baseline studies, however, it will certainly contribute to overcoming these limitations and improving future impact studies of the Brazilian Family Health Strategy Expansion Project (PROESF).


Introduction
The Brazilian health system is currently facing the challenge of consolidating its guidelines without jeopardizing its feasibility, in a social and health context that generates increasing problems and needs for the population, and in light of scarce resources, despite the rapid and diverse technological progresses in the sector.A main focus of interest for managers in confronting these challenges lies in the field of primary health care, with the Family Health Program (FHP).As such, the Project for the Expansion of the Family Health Strategy (PROESF) has been developed with the World Bank's support, with the aim of achieving, by 2010, an overall coverage of 60% in the group of 231 municipalities with over 100,000 inhabitants (total population estimated at 90.1 million individuals), up from an average coverage of 22% in 2003.This effort, which was implemented in three stages over eight years, was intended to ensure the reorganization of local systems, including improvements in the work process and in the performance of services.Investments included institutional modernization activities, improvements to the health care network, human resource development, strengthening of information systems, and monitoring and evaluation 1,2.In the implantation phase, PROESF resources corresponded to R$ 147 million approximately, of which R$ 13 million have been allocated in baseline studies, ARTIGO ARTICLE an amount that is close to 10% of investments recommended by international standards for evaluation 3,4 .
The agreement between the Brazilian government and the World Bank for the implementation of PROESF established a set of technical requirements and, among them, that baseline studies were carried out in the municipalities involved.These studies sought to characterize FHP in the initial phase of the project, for a future impact study, as well as to estimate the performance of primary health care for the group of municipalities and the observed differences in the areas with and without FHP.For primary health care policy makers, the process implemented in the evaluation of PROESF represented an opportunity to adopt innovations proposed in the policy being constructed, such as broad engagement of all stakeholders, capacity building in evaluation for individuals involved in PROESF implementation in the target municipalities, and the integration of academic work and service delivery.Baseline studies were carried out between 2005 and 2006 by eight research institutions, with recognized academic experience in the field of health management and evaluation, which were selected through na international bidding and selection process, grouping the municipalities into 14 contractual parts, according to population size, regional proximity, and financial threshold for the proportional allocation of resources.The analytical plan for the baseline studies was to be organized by the institutional groups according to the theoretical model (Figure 1) and a criteria matrix of its various dimensions (political-institutional, care organization, comprehensive care, and health systems performance), as defined in the terms of reference of the call for proposals.The overall alignment of studies (including database review and outputs displayed) was made under the ongoing supervision of a multidisciplinary technical and scientific follow-up group, comprised of four Ministry of Health consultants and technical staff members responsible for monitoring and evaluating primary health care, with the participation of practitioners from the research institutions and municipal health care managers 5,6 .
The multiple institutional creation of baseline studies -when dealing with an investigation commissioned to research groups from academic institutions, that have been selected by means of an international selection process organized in agreement with internal and external funders (the Ministry of Health and World Bank) -reinforced the interest to ensure the quality and utility of the output displayed, thus leading to the decision to carry out this meta-evaluation, "instituted" by the PROESF national manager.As such it is considered an instituted evaluation, meaning a "socially organized" evaluation that involves three components: use of a well-defined methodology and instruments; socially authorized persons who conduct it; and formalized results to be used 7,8,9 .

International standards for meta-evaluation studies
Operationally, meta-evaluation can be defined as a normative process of investigation, judgement and synthesis of a study or any evaluation procedure.It therefore differs from meta-analysis studies that seek, using statistical procedures, to summarize the results of different studies by testing the same hypothesis in one single estimation 10 .Meta-evaluation therefore consists in checking the theoretical, practical and ethical consistency of research with international and/ or governmental standards of quality control of public policy evaluations 11,12,13,14 .This apparently simple definition masks the complexity of meta-evaluation studies as the reflection and investigation of its academic and functional practices.As in the evaluation of any intervention, meta-evaluation should preferably be introduced when the evaluative research is still in course (formative meta-evaluation), so as to contribute to study improvement, and not as a summative meta-evaluation, at the end of the investigation, when it can add liability, but not more validity and utility, to the outcomes.In the case of baseline studies, as phase one of the PROESF evaluation process, the formative and participatory nature of the meta-evaluation is intended to contribute to its improvement, so that both evaluators and customers can take advantage of this double assessment, as the reflective and investigative action of academic and functional practices, also promoting bias control.The term "participatory" is used herein in the sense employed by Baron & Monier 15 of co-produced/ pluralist fourth generation evaluation.
Likewise evaluation, meta-evaluation is expected to specify its theoretical reference, allowing the various audiences to follow the steps of researchers, in order to discuss their findings and judgments.In this sense, the choice of evaluation standards from public programs of the 1970s and 1980s that were originally set in the US educational sector by the Joint Committee on Standards for Educational Evaluation (JCEE), is justified by the use of these standards to guide the actions of several professional communities of evaluators (in the United States, Canada, Europe, Africa, and so forth); the support to this theoretical reference by international agencies concerned with the financing and evaluation of development aid programs 16 , such as PROESF; and, as mentioned by Guba & Lincoln 17 , the fact that its application would not go against the purposes of a fourth generation evaluation.
Given the lack of national rules (either associative or governmental), the above-mentioned principles and references -utility, feasibility, propriety and accuracy or precision -mark out, but do not restrain, the standards used in this meta-evaluation, whose "formative nature" is inescapable, given the incompleteness of baseline studies, assuming necessary adjustments for a (summative) impact analysis of PROESF, and the formal interest in deepening and qualifying the work developed so far, thus promoting its use by the organizations and institutions involved.
If the original evaluation standards have achieved a high consensus in the Americas, for the different sectors of social policies, including the fields of health and community interventions 8,19 , they should not be regarded as a blueprint in their enunciation and application, meaning that negotiations and decision-making are required in their adaptation by each "metaevaluator".Internationally, the idea is being developed that there is a need for "open standards" due to the difficulties inherent to the transfer of parameters among different cultures and backgrounds 20 , but also to the complexity of the policies and programs being evaluated.Despite the recognition of this necessary adaptation, the development of specific standards is not yet a practice.In this sense, Hartz et al. 21, in the meta-evaluation of community interventions for health promotion in the Americas, proposed a fifth guiding principle, related to the proper treatment of the "specificity" of interventions (specificity standards).This innovative nature of meta-evaluation studies was added to the present project, understanding that the specificity of an intervention is rooted in the theoretical grounding underlying its potential action, a condition that cannot be dissociated from the relevance and liability of answers given in the evaluation research, which would support the assumption, adopted herein, of more (or less) utility of its outcomes for decision-makers 22,23 .
Based on this assumption, the following questions have been defined as the object of this paper, regarding the meta-evaluation of the evaluative research called baseline studies of To what extent did baseline studies meet the specificity standards of the intervention, in terms of their theoretical and operational consistency with the terms of reference model?How did these results vary depending on the different positions of meta-evaluators?

Methodology
Any activity involving reflective evaluation should be developed using a participatory approach oriented to foster the use of study findings 15,24 .It allows the various stakeholders to (re) build the judgment of the (meta) "evaluand", that has been problematized and (re)qualified in its construction, in accordance with the notice of selection of research groups, which advocated a participatory orientation to carry out baseline studies, whose reports are object of the present meta-evaluation.The original study sample consisted of all municipal reports (24) concluded at the time the meta-evaluation begun and submitted by the eight research institutions.Given the homogeneity of evaluation approaches inside investigation groups and time constraints to carry out the study (3 months), the selection criterion used was to include the municipal report of higher population size by groups of investigators, amounting to eight reports.
The methodological approach combined a "peer review" procedure, in two independent readings (an external meta-evaluator and the other one belonging to the follow-up group), with a process of (self ) qualification by "primary" evaluators (research group coordinators).The average score gave the final classification of the eight reports, corresponding to a unitary sampling of each evaluation team selected for the study.
As mentioned before, the classification tool took into account the panel of checklist criteria and valuation proposed by Stufflebeam 18 for the assessment of the four international quality standards of evaluation studies: utility, feasibility, propriety, and accuracy.This choice is sectorally supported by the fact that they have been widely disclosed (and in open access) over the last decade for the evaluation of disease control and prevention programs or in health promotion community programs 19,25 , already translated and published as educational material for the formation of evaluators 26 .The fifth standard, related to the project specificity, was assessed according to three criteria: consistency with the theoretical model; multidimensional and relational analysis of results, having the baseline studies terms of reference as "gold standard" 1 ; and the same scale of values of the remaining standards.
The testing of tools revealed that the following standards should be taken into account, with maximum scoring (10 points): evaluator liability; timely delivery and dissemination of the outcomes; cost-effectiveness; formal agreements; people's rights; human relationships; conflict of interests; and fiscal accountability.This decision was made considering that these criteria were treated adequately in the international tender process, in the ethics committees of the selected research institutions, and checked by the external committee for follow-up of the work plans and project accomplishment 6 .

Main results and discussion
Table 2 shows the results of the meta-evaluation, grouped according to the five standards (average and differentiated values for the three groups of meta-evaluators), whose answers to the three questions can be summarized in the findings to follow.The highest scores given by external evaluators to parameters related to baseline studies being carried out within legal and ethical rules (propriety), and to the accountability towards the proper use of resources and meeting the deadlines (feasibility) highlight the liability and responsibility of the groups selected for the studies and the support given by experts in their follow-up.Studies oriented by terms of reference, expressed in the standard "specificity", question two, was the only criterion rated as "poor" by the two groups of external auditors (3.3.and 4.5).Besides, (self ) qualification scores were almost always higher than those given by the auditors, while specificity' was rated as "excellent" (9.2) only once by the auditors of the research groups.The main purpose of the following discussion, interacting with the international literature, is to increase the understanding and contextualization of differentials observed in the composition of standard scores, detailing them in light of existing or inescapable constraints in the metaevaluation.
Beginning with utility, even though local actors had been identified in a workshop that was a contractual requirement to start the research, we observed that it did not imply the concern with identifying their own information needs, nor the different interests and points of view of stakeholders; only the federal manager explained his needs through the terms of reference.Given that the participation of individuals and groups with interest in the evaluation process is essential to promote its utility, even justifying the study being carried out, the low scores for this parameter call Note: excellent (9.0-10.0);very good (7.0-8.9);good (5.0-6.9);weak (3.0-4.9);poor/critical (< 3.0).
the attention to a reduced potential use of baseline studies by municipal managers.Returning to our initial assumptions, criteria informing us about the relevance or not of the evaluation, that is, its ability to provide answers to problems facing decision-makers, were identified in the standard "utility" 22 .To meet the concrete needs of decision-makers and other stakeholders requires an evaluation practice that takes into account their backgrounds and points of view.As suggested by various studies 27,28,29 , the utility of evaluations is largely determined by the perception of stakeholders about the study and its outcomes, and such perception derives from background factors and the nature of the relationship between evaluators and stakeholders.In this way, the stakeholders expect to recognize their ideas and experiences reflected in the evaluations, which supposedly they will be using in future.
Accuracy or precision represent a key standard for expressing the methodological rigor and, consequently, the liability of an evaluation.From this point of view, the design of studies in general, according to data available in the research reports, shared an inconsistency with the terms of reference and the very focus of interventions: the nature and governance of the municipality were pushed to the background whenever the site was repeatedly reduced as an observation unit, focusing the analysis on "operational lots".Another aspect to be highlighted in this parameter is the omission and/or low importance regarding the description of the local program to be implemented by PROESF, even though numerous studies already published about FHP show a great diversity of organizational modalities in the different municipal backgrounds.Reports also indicated problems in handling data on beneficiaries and stakeholders, seen as "information sources" rather than as partners in the evaluation process, which is extremely important for the use of study outcomes.This constraint was reflected in the assessment of the criterion "justifiable conclusions", which falls short not only from the lack of points of view from stakeholders, but also from partial approaches for the expected dimensions, sometimes revealing the failure of the methodological option to understand the program's completeness.A prevalent risk in evaluation studies, as pointed out by other authors 27,30 , refers to the choice of the study design by the evaluator first according to his (or her) expertise, giving less importance to issues raised by the intervention and by stakeholders, which should define the methodological options.On the other hand, other good quality evaluations did not succeed precisely because they neglected to acknowledge the influences of interpersonal, ethical and political factors, all of which guide the work of the evaluator 31 .
The specificity issue, based on the terms of reference intervention logic model, characterizes theory-driven evaluations, which require, for the assessment of study results, an analysis of the program inferences in their causal interactions, in addition to ethical, political and methodological considerations observed in the remaining parameters.The program theory provides not only a guide to analyze the phenomenon, but also a framework to understand the meaning of the research findings 32 .Along with relevance and liability, this theoretical grounding, taken as one of the assumptions of our study, affects the use of evaluation results for decisionmaking 22 .If "specificity" had the lowest scoring in the independent readings of the reports, this was due to a low expression of the proposed model, which did not relate the findings of the investigated dimensions, their use resembling a "static" support, rather than presumably interdependent analytical categories.However, it is worth remembering that a similar result was observed in the meta-evaluation exploratory study of health promotion community interventions in the Americas 21 , in which only 52% of cases were classified as "good or very good", in contrast to accuracy, which achieved this classification in 80% in the studies, suggesting that the concern with methodological rigor prevailed over the complexity of the treated objects.
The reason for the differences observed in the meta-evaluation between the participating groups was not investigated, but it can be suggested that one of the factors lies in one of the limits of this meta-evaluation study, which only used the final reports from baseline studies.The by-products of these baseline studies, such as journal articles and empowerment material, which might contain aspects missing in the final reports, could not be assessed in this first meta-evaluation.Another aspect has to do with the lack of previous orientation to research groups for the standardization of reports, with focus on the specificity and utility for decisionmakers, especially at the local level.In this sense, the reports seem to serve much more to study funders and central level governance, explaining the conflict of interests between national versus local, a permanent challenge for any evaluator of social policies 28 .
Another constraint in the search for evidences for interventions regarding complex problems such as baseline studies, which might have also interfered in their accuracy or precision, is inherent to normative analyses, as is the case of the ac-tion matrix foreseen in the terms of reference for the operationalization of analytical dimensions of the logic model.According to Potvin 10 , this occurred because of the large number of arbitrary decisions concerning comparable results to be summarized, and how to weigh these decisions in order to reach the conclusions.This represents another challenge for the (meta) evaluation, in settings where it has been carried out with different expectations, in which there is a counteraction between project components or dimensions that are difficult to grasp in terms of extension and depth.
On the other hand, if it is likely that the reports do not express all the richness of the material collected and analyzed for the qualification of studies, the meta-evaluation, understood not as an end in itself, but as part of an open dialogue between the many actors in this process, should contribute to organizational learning, thus overcoming a few communication barriers and promoting the use of baseline study results 33 .

Final considerations
The complexity of the intervention, represented in the terms of reference of baseline studies, required that investigators adopted multiple evaluation perspectives and strategies to understand the various dimensions of the program.Facing this challenge with the necessary consistency perhaps required some more time, resources and support than those available.If the constraints for the studies being carried out did not jeopardized their liability, in the desirable prospect of going back to the municipalities to present baseline studies results and/or new evaluative research, they lack more adequacy of analytical procedures for an overall and multidimensional understanding of their object, as exemplified in the outspreads of this meta-evaluation of case studies carried out by Pontes-da-Silva & Figueiró 34 and in the recommendations for new terms of reference for evaluating PROESF impact, addressed by Felisberto et al. 2 .
Reiterating the formative nature of this meta-evaluation, our major commitment has been with the learning process it provided and the use of these lessons.Therefore, in terms of the quality of this experiment we understand this to mean not only the use of normative parameters, but also the awareness of the learning deriving from it, as well as the resultant and effective changes implied in the common project of evaluation institutionalization, or rather, acculturation.Such awareness requires that we, (meta) evaluators, be the first users of its results, as apprentices that we are of each study carried out.Thus, for example, Medina & Fernandes 35 , while investigating the development of the field of evaluations in other countries, observed, in the case of Canada, progresses obtained from the identification and correction of the main problems in the development of studies (such as quality improvement of reports after standardization and definition of systematic criteria for their assessment) that jeopardized the "evaluation health" in governmental policies 36 .
Regarding the institutional status of the evaluation, the sense employed herein, which also inspired and guided our study, is that we are aligned with the literature of the field of evaluation which moves towards increasingly participatory and democratic procedures.Differences found in the results of the different groups involved in baseline studies corroborate the importance of multiple points of view being taken into account, simultaneously allowing evaluators to understand and contribute to decision-making throughout the studies, in accordance with recognized quality parameters, from both an internal and external point of view.When we consider that the judgement of an intervention must not be the exclusive privilege of evaluators and their logics, but equally of other stakeholders, from and beyond the results of evaluations, this means that we are engaged with effective participation.It therefore ensures the primacy of our interests and needs in defining the problems and objectives that will guide the evaluations, the inclusion of our values in the judgement criteria that have been agreed upon, and consequently more useful findings, legitimizing and fulfilling our mission to have a positive influence on health policies.

Figure 1 Theoretical
Figure 1Theoretical model as analytical landmark for the baseline studies.

Table 1
Meta-evaluation international standards.
PROESF: What was the classification obtained from the reports that have been analyzed in relation to meta-evaluation international standards?

Table 2
Final scores of baseline studies -Brazilian Family Health Strategy Expansion Project (PROESF) meta-evaluation.