Guideline for incorporating the Delphi method in the evaluation of nursing theories

Objective: to describe a guideline for the use of the Delphi method to evaluate nursing theories, from the perspective of internal validation. Method: a methodological study, targeted at the development of a guideline for the use of the Delphi method in the evaluation of nursing theories. Results: the Delphi method, principles of collective wisdom and levels of proficiency are used in the production of a guideline for organizing, searching, selecting and coordinating the activities of theoretical evaluators in teams. It distinguishes three phases for the theoretical evaluation process: Preparatory Phase (PP); Intermediate Phase (IP) and Theory Evaluation (TE) phase, incorporating Delphi-type selection procedures; search, selection and classification of judges/evaluators for the theory; definition of criteria for carrying out rounds and maintenance or removal of units of the theory evaluated. Conclusion: the developed guideline was able to adapt the elements of the Delphi method as a favorable strategy for the internal validation of nursing theories.

the crowd wisdom theory, criteria such as independence, decentralization, diversity and aggregation would guide the constitution of groups, in which the aggregate decision would surpass that of the specialist, separately (5) .
In this way, analysts, not necessarily experts in metatheory, act as judges for the content, the structure and other criteria to be judged. From the aggregate judgment, consistent results are achieved that allow for the theory evaluation to be carried out successfully. However, guidelines, methods or techniques with this conformation are not available for use with nursing theories.
Presumably, the Delphi method is adequate to evaluate a nursing theory supported by the crowd wisdom criteria, demonstrating which groups can judge adequately under conditions of uncertainty, defining the fundamental concepts, judging and adding the collective value of the ideas (5)(6)(7) . It has been used to deal with issues not clarified by experimental approaches in which the opinion of a group has value to clarify them, therefore, being compatible with internal validation (6) .
However, its application for this purpose is scarce.
Its use was identified in the literature only in a theory of the education-informatics interface in the evaluation of the criteria of importance, precision and clarity, parsimony or simplicity, understanding, operationalization, empirical validity, fruiting and application (8) . The methodological description in the study mentioned above does not provide enough elements for its use in the evaluation of nursing theories with formal criteria, usually applied in the discipline (4) .
In Brazil, the Delphi method has helped in addressing practical problems such as trend indication, obtaining consensus on a program or intervention, expert opinion for comparing treatments and, more widely, in the construction of tools for evaluation and in the creation and validation of instruments (9) . The adaptation of the method for nursing theories evaluation remains a potential that has not yet been explored, even given its innovative character. This article was prepared given the scarcity of research studies and the potential from the development of a guideline.
The article aims to describe a guideline for the use of the Delphi method to evaluate nursing theories, from the perspective of internal validation.

Method
This research is a methodological study for developing a guideline for the use of the Delphi method in nursing theories evaluation, indicating procedures for organizing, searching, selecting and coordinating the activities of theoretical evaluators in teams. The criteria of collective wisdom and levels of proficiency (5) were the reference basis and its elaboration took place in Rio de Janeiro, RJ, Brazil, between the months of November and December 2019.
The elements used in the methodological frameworks for the design, construction and testing of guidelines were incorporated, highlighting the following: selecting the topic and scope; adapting a prototype of a theoretical evaluation strategy guideline, using the Delphi method; group formation for development; systematic search for evidence; analysis and synthesis of available evidence and elaboration of the recommendation (10) .
The specific procedures for developing the guideline were the following: a simple review of manuscripts on the use of the Delphi method in theories evaluation and other applications; interpreting nursing theory evaluation methods (4,(11)(12)  The master's dissertation that incorporated the use of the prototype evaluated the Theory of Professional Links (13) by Meleis' theoretical evaluation strategy (14) .
The study that applied the prototype of the guideline

Results
Encompassing the Delphi method in the nursing theories evaluation, the guideline has three phases: Preparatory Phase (PP), Intermediate Phase (IP) and Theory Evaluation (TE) phase. This study details the intermediate phase, as shown in Figure 1. In the preparatory phase (PP) the theory to be evaluated is chosen and the strategy to be employed is selected from the alternatives available in the literature.
In the intermediate phase, nine procedures related to the use of the Delphi method are outlined. The first procedure is related to the type of Delphi to be used, influenced by the level of the theory to be evaluated and by its application maturity.
In the second procedure, the coordination role of the theoretical evaluation is defined, which can be accumulated with the condition of a primary evaluator. So, based on collective wisdom (5) , the teams of a) More than one participation as a guest (lecturer, speaker, commentator, an instructor/professor in a course or short course) in a scientific event to teach a theme related to nursing theories or meta-theories 4 points b) Participation as a guest (lecturer, speaker, commentator, an instructor/professor in a course or short course) in an event to teach a theme related to nursing theories or meta-theories 3 points c) Participation as a listener/participant/student in a completed event or course on nursing theories or meta-theories 2 points Rev. Latino-Am. Enfermagem 2021;29:e3387.

Discussion
Theoretical evaluation is able to provide elements about a "good" theory, with several formal and systematic criteria available in the literature (2,11) . However, human resources with the competence and knowledge required to and, as a final goal, determining the potential contribution of the evaluated theory for the scientific knowledge (11) .
Unlike the theory analysis that decomposes a theory to examine its parts or components (4) , theoretical evaluation also judges them. However, even a theory judged to be "good" can prove to be inadequate in its descriptive, explanatory, predictive or prescriptive value from its confirmation or application. This places internal validation as a relevant stage, although not terminal of a theoretical development program.
Theories that violate the virtues of a "good" theory are more difficult to refute and tend not to, actually, contribute to knowledge (17) . The reasons for the reduced use of nursing theory evaluation strategies through formal systematic criteria are uncertain (4) . However, influence can be attributed to the difficulty in obtaining evaluators with sufficient epistemic authority to judge the meta-theoretical items of internal validation. It is supposed that the strategies linked to collective wisdom can overcome this problem of dependence on the "expert" with substantial advantages (18) .
The Delphi method is based on the John Dewey's assumptions, emphasizing anonymous communication between individuals with expertise in a given topic, with the goal of seeking the opinion of experts in an iterative and structured way and usually seeking to achieve a consensual position (15,19) . The freedom and observance of the judges' personal opinions guarantees the independence criterion of collective wisdom (5) .
Regarding the use in research studies, although it is used predominantly in mixed and quantitative, it has its qualitative application and even in the construction of practical theories, in the context of community organization (15) . Theory evaluation is a qualitative process permeated by subjectivity and by standards, conducts and codes of the evaluator (8) .
The Delphi method can coordinate these qualitative characteristics of the evaluation process, dealing with personal variables of the independence criterion, making the most of group work. It can be used for interpretation, for predictions and for obtaining recommendations of the evaluation developed (8) .
In choosing the Delphi method, the most common approach is the traditional one, also being referred to as normative or of consensus. It aims to reduce variance in the estimates and biases among experts. However the Delphi Policy or Policy of dissent, seeks to obtain a wide range of opinions, but without seeking consensus (16) .
For the theoretical evaluation, consensus Delphi is the most likely indication; however, the use of dissent can be recommended for theories of high originality, conceptual density, complexity and theoretical abstraction or when it is difficult to determine the consensus criteria.
Additionally, one of the goals of the evaluation can be to explore the contradictions in the production of definitions or theoretical proposals.
Regarding the characteristics of the theory, consensus Delphi can be indicated for those of micro-or middlerange with conceptualization described in more than one empirical study or to evaluate partially disseminated, tested or used theories.
Supposedly, for consensus Delphi the composition of teams with a high number of evaluators is only justified when it is difficult to obtain evaluators with higher levels of expertise, because it is challenging to obtain consensus in groups of many components. On the other hand, it is assumed that the dissent approach benefits from the composition of larger teams and with a wide range of proficiency levels, tending to broaden the debate from different perspectives and to bring original elements that differ from the original theory and from the primary evaluation.
Panels with more participants tend to have lower answer rates, with an estimated reduction of 0.08 percentage points for each added participant (20) .
A number of 5 to 20 experts are indicated if it is a recommendation based exclusively on the characteristics of the Delphi method (20) . Studies on the development and application of Core Outcome Set (COS) have used the Borel MCG, Lopes ROP, Thofehrn MB, Nóbrega MML, Arreguy-Sena C, Brandão MAG.
Delphi method to determine which results to measure, with the predominance of Delphi panels of up to 50 people (20) .
In the theoretical evaluation it is challenging to establish a minimum and maximum number of evaluators/ judges, due to its philosophical character and abstract epistemological nature inherent to theorization. For example, for new or poorly disseminated theories, it can be difficult to have many secondary evaluators with adequate expertise. On the other hand, large teams of beginner evaluators may not have knowledge of a metatheoretical nature, causing a dispersion of perspectives that would hinder the aggregation of ideas. In this case, the guideline seeks to circumvent the limits by combining a balance between the criterion of diversity of the principle of collective wisdom and the expertise required for theoretical evaluation (4)(5) .
The prototype of the guideline included four evaluators with three different expertise levels, and three secondary evaluators who together collectively summed 36 points (14, 13, and 9 individual points). According to the expertise points, the criteria for defining the team were useful for the composition of this small group, as the configuration of fewer participants guaranteed the maximum answer rate, as expected for this panel size (20) . The differences in training levels and stories of the evaluators ensured the decentralization criterion (5) .
However, whenever possible, it is recommended to assemble teams with five or more judges.
Patricia Benner's model (21)(22) with its five levels of the diversity criterion of collective wisdom (5) .
The studies commonly apply two to three rounds for the Delphi method (19) . However, the multiple criteria to be evaluated, the high number and diversity of profiles of the evaluators may require more rounds to reach consensus. It is desirable to plan a minimum according to the number of evaluators, to ensure that an excessive effort to manage the task results does not fall on the Delphi coordinator, compromising their quality.
The scope level of theories can influence the definition of criteria to be evaluated by judges; for example, when a given middle-range theory is evaluated as a model, even more specific and empirical criteria can be used (12) .
However, this does not, directly, interfere with the nature of the Delphi method as a strategy.
The decision to reach consensus among judges is a type of mechanism to meet the criteria of aggregating collective wisdom, transforming individual judgments into a team's decision (5) . The consensual decision can start from the evaluators' own opinion that a consensus was reached; however, it is recommended that this does not happen automatically after completion of the Delphi technique (19) .
It is necessary to specify which conditions are required for reaching consensus when the decision is qualitative. When quantitative measurement procedures are adopted, establishing the measures and cut-off points will be used to establish the degree of agreement or disagreement, compatible with the consensus or dissent (19) .
There are no mandatory rules for consensus building,  to exclusion or maintenance after evidence obtained from experimentation or field research (1) .
It is highlighted that, from the evaluation of the Theory of Professional Links (27)(28) , emerging factors demanded changes to criteria not detailed when the prototype was elaborated, which contributed to the deepening for the creation of the guideline presented in this article.
The study is not limitation-free. The focus of any research using the Delphi method will always be obtaining high-quality answers from a selection of expert individuals (29) . However, the internal validation of a theory deals with theoretical-philosophical criteria that can make it difficult, for a secondary evaluator, to produce or judge the quality of the answers by the nature of the object evaluated and by the judgment property to be performed. For example, the conceptual definition is one of the elements of a theory, evaluated in its semantics, logic, and context (14) . Notably, it can be difficult to make a "good answer" judgment for such a complex construct, given such properties.
The limitation for the subjectivity of the judge's judgment in theories evaluation must be confronted with the philosophical root of the theorist and of the evaluator.
Critical-social, hermeneutic or new pragmatism roots tend to deal with greater fluidity in the face of different perspectives, including exploring them in consensus or in dissent. On the other hand, as it requires greater objectivity of reality, post-positivism requires more stable, generalizable or measurement criteria (30) . In this last philosophical root, methods such as structural equation modeling, factorial analysis and multiple regressions may be the best choice for theory evaluation, obviously with criteria closer to external validation (4) .
Among the contributions for the advancement of Obviously, the main goal of the evaluation is to identify a "good theory", which implies judging the adequacy of its components; however, this procedure must be performed with extreme caution by the evaluators, understanding that, in a theory, there is hierarchy and relationship between the elements.

Conclusion
The The use of a guideline prototype in the evaluation of middle-range nursing theory, the Theory of Professional Links, brought satisfactory results that presume its feasibility and pointed out ways for refinement.
It is understood that it is essential that other researchers replicate its use in the evaluation of grandand micro-range theories for future adjustments and updates of the guideline, also adopting evaluation strategies by formal criteria different from the one used in the prototype.