SciELO - Scientific Electronic Library Online

vol.46 issue4The identification of the pillars of education in the class comprehensiveness in healthcareDo the physical discomforts from breast cancer treatments affect the sexuality of women who underwent mastectomy? author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Revista da Escola de Enfermagem da USP

Print version ISSN 0080-6234

Rev. esc. enferm. USP vol.46 no.4 São Paulo Aug. 2012 



Nursing diagnoses analysis under the bayesian perspective



Marcos Venícios de Oliveira LopesI; Viviane Martins da SilvaII; Thelma Leite de AraujoIII

IRN. Ph.D. in Nursing. Statistician. Associate Professor at Universidade Federal do Ceará. Fortaleza, CE, Brazil.
IIRN. Ph.D. in Nursing. Adjunct Professor at Universidade Federal do Ceará. Fortaleza, CE, Brazil.
IIIRN. Ph.D. in Nursing. Associate Professor at Universidade Federal do Ceará. Fortaleza, CE, Brazil.




The use of Bayesian statistical techniques is an approach that is well accepted and established in fields outside of nursing as a paradigm to reduce the uncertainty present in a given clinical situation. The purpose of this article is to provide guidance regarding the specific use of the Bayesian paradigm in the analysis of nursing diagnoses. The steps and interpretations of Bayesian analysis are discussed. One theoretical and one practical example of Bayesian analysis of nursing diagnoses are presented. It describes how the Bayesian approach can be used to summarize the available knowledge and make point and interval estimates of the true probability of a nursing diagnosis. It was concluded that the application of Bayesian statistical methods is an important tool for more accurate definition of probabilities related to nursing diagnoses.

Descriptors: Nursing diagnosis; Methodology; Statistical analysis




Although the lack of attention to diagnostic accuracy is a problem described in literature(1), it is known that the use of valid and reliable nursing diagnoses strengthens professional responsibility, practice and basic nursing research(2). In this sphere, it has been defended that evidence on accurate nursing diagnoses or the best nursing intervention choices is still relatively small in comparison with biomedical literature(3). In addition, nurses need skills to reach decisions for more complex problems, departing from a well-structured knowledge base(4).

The accuracy of nursing diagnoses has been described in view of individual clinical judgments in specific situations(5). On the other hand, discussions on how to accurately analyze a set of data concerning nursing diagnoses have been scarce. In most cases, these discussions are centered on the description of sample percentages taken from specific populations. Despite the existence of different analysis techniques that can be used to enhance diagnostic precision and accuracy, literature appoints that, to cope with the uncertainties present in different clinical situations, nurses commonly use knowledge based on their experiences, which is considered necessary, but not a sufficient base for this end(6-7).

Hence, the correct definition of the prevalence, incidence or even the impact of nursing diagnoses in a specific population needs to be more precise and accurate, with a view to guiding the choice of the intervention or information to be delivered to patients more adequately(7). When considering that nurses work in conditions of uncertainty, dealing with probabilities, skills need to be developed to assess the best estimates in the light of available evidence(8). Moreover, nurses are increasingly asked to order and interpret diagnostic tests, making it essential for them to understand the importance of recognizing basic standards and measured associated with clinical conditions(7).

In this respect, (specifically) clinical judgment and the identification of a nursing diagnosis' occurrence patterns (in general) represent a process of information accumulation, with a view to reducing our uncertainty about these diagnoses. A study about the concept of uncertainty concluded that, in nursing, additional research is needed to explore the extent of probabilistic reasoning and its effects on our degree of uncertainty about the phenomena specific to the profession(9).

In the probabilistic domain, decisions are based on predicting the probability of particular patient outcomes(10). In its pure sense, probabilities represent chance, or a numerical measure of the uncertainty associated with an event or events. The probabilistic approach provides information about the degree of uncertainty of a diagnosis, and can thus be used as a base to improve practice(7,10-11).

In this sphere, statistical analysis plays a fundamental role. More recently, researchers in different areas have taken interest in a statistical branch that deals with the definition of probabilities from the subjective viewpoint: Bayesian statistics. The use of Bayesian statistical techniques is a well-accepted and established approach in areas other than nursing(8).

Some authors consider that Bayesian thinking can contribute to reduce uncertainty about nursing phenomena, contributing to accurate analyses(12). Other authors affirm that, in its study about the quality of nursing judgment and decision making, none of the studies analyzed used this approach as a criterion to compare nurses' judgments and decisions(11). These same authors add that, independently of discussions about the existence or not of a 'pragmatic reality', Bayesian analysis can be an important additional tool for measurement purposes in judgments and decision-making research.

In this sense, the Bayesian paradigm exactly studies uncertainty about scientists parameters of interest and uses the probability concept that corresponds to the use of this world in daily life(13). Therefore, the development of a specific probabilistic notation for Bayesian analysis of nursing diagnoses can contribute to discussions about and improvements in the characterization process of diagnostic profiles(8,11).

Based on the above considerations, the aim in this study is to describe the bases of Bayesian analysis and present directions for the specific use of this paradigm in the analysis of nursing diagnoses. The final description in this paper includes the fundamental steps for the probability determination of a human response, based on preliminary knowledge (a priori distribution) and on knowledge provided by research data (likelihood).



Two main statistical approaches exist: the classical (aka frequentist or objective) and the Bayesian(14). The basic difference between these two approaches is the probability concept. Despite different definitions of probability, the two types of interest to define in this paper are empirical and subjective probabilities. The empirical probability view is defined by the proportion or relative frequency of the observed event in the long term (in the statistical jargon, we talk about "when the sample tends to the infinite). On the other hand, the subjective probability view refers to the personal measure of uncertainty based on available evidence. That is the view the whole Bayesian analysis rests on.

It is important to highlight that both approaches tend to present the same numerical results in situations in which our conclusions are only based on empirical observations (research data). Their interpretations, however, are completely different. Moreover, subjective probabilities can be used to analyze events for which we have no previous empirical data. These aspects will be discussed further ahead. Finally, subjective probabilities can be considered more general than empirical probabilities, as the former can also be used to measure our uncertainty about unique and particular events(14).

As described in many studies, the Bayesian approach originates in the work of the British cleric Thomas Bayes about equality between general probabilities, which became known as the Bayes theorem. The Bayesian proposal for diagnostic probability analysis is strongly based on axiomatic fundamentals, which provide a unified logical structure, contributing to consistent data evaluation(15). In terms of nursing diagnoses, the probability of a human response is interpreted as a conditional measure of uncertainty of this response event, considering specific defining characteristics and the context in which the health assessment took place.

In practical terms, nurses are confronted with a set of patients, in which each of them can present the human response of interest with a certain probability degree. In this context, the definition of the actual probability that a human response will occur in a given population depends on preliminary knowledge about human response in that population and on available clinical information. This characterizes nursing diagnosis analysis as a measurement problem of the degree of uncertainty. In a notational form, the probability θ of a human response based on available information k, represented as p(θ|k), is a measure of the degree of belief in the presence of response θ, as suggested by the information contained in k. Hence, the probability attributed to a response is always conditional to the information one has about it in a given clinical situation.

Thus, and considering a set of defining characteristics, Bayesian analysis is based on the fact that the final probability of a human response event (a posteriori probability), conditioned to the information obtained through a data survey (research), is proportional to the probability of obtaining that research sample (likelihood) multiplied by the probability that was initially attributed to the response (a priori probability). As described, the likelihood function is directly related with the human response event identified based on the research data, while a priori probability represents the prevalence of the human response based on previously existing knowledge.

In short, Bayesian analysis involves three phases: establishment of initial information and its corresponding initial probability distribution; likelihood information based on the obtained data; and establishment of posterior probability, which combines preliminary information with the research data(16). Each of these phases has essential characteristics that should be carefully considered to analyze the human response of interest.



Bayesian methods demand the choice of a prior probability distribution. In this phase, researchers should define a single probability distribution that describes available knowledge on the human response of interest, and use the Bayes theorem to combine this with the information the research data provide in the likelihood function. This phase is a hard task though, in which prior information may have a weak role as adequate approximations of an actual prior distribution. The naïve use of simple prior distributions, like the search for presumably non-informative probability distributions, can hide important non-proven suppositions, which can easily dominate or invalidate the analysis(17).

In technical terms, at first, the definition of prior distribution depends on how the variable of interest was defined. A nursing diagnosis can be considered a discrete distribution variable, as its values are finite in a given interval, represented by the number of individuals with the diagnosis of interest. The probability distributions of discrete variables of direct interest for nursing diagnosis analysis include binomial distribution. Its probability function is defined as follows:

In this case, P(H = h) represents the probability of finding exactly a number h of individuals in the sample observed with the human response of interest; n is the total number of individuals assesses, h represents the number of individuals with the human response and θ is that probability that the human response of interest will occur.

The use of so-called conjugate distribution families can facilitate Bayesian analysis. The statistical term conjugate family refers to the fact that, for any initial distribution belonging to a distribution family, the final distribution will belong to the same family. In many circumstances, binomial distribution is conjugate with the Beta distribution family(18). The following formula is used to calculate Beta distribution:

Where isthe Gama function.

In summary, the above distribution measures how probability the occurrence of a given human response is, in view of the described initial knowledge and that available in scientific literature(15,19). On the other hand, discussion is ongoing about the situations in which we supposedly do not have information about the probability and/or probability distribution of the event of interest. Some authors recommend the use of non-informative distributions. These distributions should represent our lack of knowledge about the phenomenon. To give an example, if we are interested in identifying the actual occurrence proportion of a nursing diagnosis, a non-informative distribution would be a binomial distribution with p = 0.5, i.e. in this case, we are considering that the probability that an individual will present the nursing diagnosis in question is similar to casting a coin and deciding according to the obtained result. It is a fact that a priori distributions are highly subjective and another strategy to identify these distributions can be discussions with expert panels. The aim of these panels is to establish a consensus on the probability of the event in question.



Next, and based on the data collected in a research, the likelihood ratio is established, where p(k|θ) is a function that describes the likelihood of the distinct values of human response θ in the light of data k the research provided(15). In fact, the θ values that increase p(k|θ) are those values that made the observation of result k, which was eventually observed, more plausible. Consequently, after observing the sample data, the value of the observed human response proportion is more likely than the others.

In this context, sufficient statistics are needed. The definition of sufficient statistic refers to a function that contains all information about the human response available in the research data. Concerning nursing diagnoses, a sufficient statistics in the proportion of individuals with the diagnosis. The likelihood estimate is but the estimates of the parameters of interest, obtained based on the study sample, which is common is classical statistical analyses.



The Bayes theorem is simply expressed in words through the state in which a posteriori distribution is proportional to likelihood times a priori distribution(15,19). The posterior distribution is centered in a point that represents a balance between prior information and the data obtained in the research, so that the data control this balance to a greater extent as the sample size increases(16).

From a Bayesian viewpoint, the final result of an inference problem in any unknown number is but the corresponding posterior distribution. Thus, everything that can be said about any human response θ of the parameters in the probabilistic model is contained in the posterior distribution p(θ|k)(15). In summary, the definition of the final probability distribution (θ) of a human response is extracted from research results about the prevalence of a human response, defined through the model p(k|θ), combined with the initial distribution p(θ) of the human response of interested (initially supposed prevalence).

To define a final distribution, in general, the conjugate distribution families are used. In some situations, however, the distributions may not take a known or even simple form. In these cases, data simulation methods have been used to extract Bayesian analysis conclusions. As this topic is somewhat complex, it will be not discussed here. Anyone interested can consult the large bibliography available about Monte Carlo simulation methods in Markov Chains, EM algorithms etc.

To estimate nursing diagnosis event proportions, if the a priori distribution was defined as a binomial distribution, its conjugate final distribution will be a Beta distribution. Thus, the final distribution p(θ|k) of a human response of interest can be treated as a Beta distribution with the parameters (α + k, β + n - k) - where α and β represent parameters that can be calculated based on the mean and variance of the initial binomial distribution, n is the number of individuals assessed and k the number of individuals with the phenomenon of interest(17,20).



The intent is to discover the unknown proportion of the nursing diagnosis Ineffective breathing patterns, indicated by θ here, in a population of children with pneumonia. After a bibliographic survey, prevalence rates of θ are identified between 0.20 and 0.60. After assessing 1000 children with pneumonia, 200 displayed the nursing diagnosis of interest. To define the final probability distribution of the Ineffective Breathing Pattern diagnosis in this population, we will proceed as follows:

Step 1: Establish the initial distribution:

We consider θ a variable whose probability can vary in the interval between 0 and 1, making it a continuous variable. For the case of a human response, the distribution that treats θ as a continuous variable is the Beta distribution. This distribution is characterized by two non-negative constants α and β, and the values of their respective variables vary fit into the interval 0 and 1. To find the Beta distribution of interest, let us start by specifying the means and standard deviation of θ. In our case, we can assume that θ = 0.40 (mean prevalence rate for the diagnosis Ineffective Breathing Pattern). As the largest part of our probability has to range between 0.20 and 0.60, it is reasonable to suppose that there are two standard deviations from the mean 0.40 until 0.60, which equals a distance of 0.20. Then a standard deviation is equivalent to 0.10. One property of the Beta distribution is that α and β can be found through the means and variances, applying the following formulae: α = μ{[μ (1 - μ)/ σ2] - 1}; β = [1 - μ]{[μ (1 - μ)/ σ2] - 1}, where μ indicates the mean θ and σ the standard deviation.

For our example:

α = 0.4{[0.4 (1 - 0.4)/ 0.01] - 1} = 9.2

β = [1 - 0.4]{[0.4 (1 - 0.4)/ 0.01] - 1} = 13.8

To avoid complex calculations with the Gamma function due to non-whole values for α and β, we will round off the values of α and β. Then, the initial distribution is: p(θ) = Be (θ|9, 14).

Step 2: Establishment of Likelihood distribution:

In the analysis of the 1000 patients, 200 of whom presented the nursing diagnosis under analysis, the binomial distribution is used to calculate the probability extracted from the data:

Step 3: Establishment of final posterior distribution:

The a posteriori distribution is obtained by the product of the a priori distribution and the likelihood:

p (θ|k) = Be (θ|α + k, β + n - k) = Be (θ|9 + 200, 14 + 1000 - 200) = Be (θ|209, 814), where n is the number of patients assessed and k the number of patients with the nursing diagnosis under analysis (Ineffective breathing pattern).

Step 4: Calculation of descriptive means and credibility intervals:

The mean posterior distribution is calculated as = μ = α/(α + β) = 209/(209 +814) = 0.204

The posterior distribution variance will be = σ2 = [μ(1 - μ)]/(α + β + 1) = [0.204(1 - 0.204)]/(209 + 814 + 1) = 0.000158 and the a posteriori standard deviation will equal 0.0126.

As the Beta distribution is symmetrical and can be approximated by a normal distribution, and given that the standard deviation of θ was calculated as 0.0126, for a 95% confidence level, we calculated 1.96 standard deviations, equaling 0.0246. Subtracting and adding up this mean value gives us a 95% credibility interval of 0.1796 - 0.2289. It was intuitively concluded, at a 95% confidence interval, that the probability of the diagnosis Ineffective Breathing Pattern ranges between 17.96% and 22.89%. Observe that the interpretation of the Bayesian credibility intervals differs from the interpretation of confidence intervals in the classical approach.



For an application using real data, a data set related to estimated proportions of nursing diagnoses in children with congenital heart disease was obtained from an earlier study(21) of 45 individuals. These data were used as the base to define the likelihood function of four nursing diagnoses: Activity intolerance, Ineffective airway clearance; Delayed growth and development and Ineffective breathing pattern. The final proportion distributions of each diagnosis were calculated, based on three possible a priori distributions, besides data from the research mentioned: one non-informative binomial distribution with a parameter of 0.50 and variance of 0.25; one a priori distribution based on the judgment of two nurses experienced in the use of nursing diagnoses and care for children with congenital heart disease, who sought a consensus on the probability of each diagnosis and the respective 95% probability interval; and an a prior distribution based on data from a previously published study that used a similar population of 22 children(22). Calculations were developed in a similar way as presented in the previous section and summarized in Table 1, which displays the final estimated proportions of the diagnoses Activity Intolerance (AI), Ineffective Airway Clearance (IAC), Delayed Growth and Development (DGD) and Ineffective Breathing Pattern (IBP) in children with congenital heart diseases, based on three a priori distributions.

As the sample size was moderate, the posterior distribution was strongly influenced by the clinical data (likelihood function) and the punctual estimates and credibility intervals did not show mutual differences. To give an example, it can be affirmed with a 95% confidence level that the probability that children with congenital heart disease will present the nursing diagnosis Activity intolerance figures between 83% and 85%. It is perceived that the interval is quite small, indicating the good precision of our estimate. Similar reasoning is possible for the other diagnoses.



The axiomatic bases of Bayesian premises permit direct comparisons of nursing diagnoses in different realities, with a view to establishing whether the information found can be used as valid information in the form of a prior distribution. Then, the available knowledge and research data can be summarized in a posterior distribution that presents punctual and interval estimates of the actual probability of a nursing diagnosis, providing an intuitive interpretation that is closer to reality.

Thus, the main advantage of using Bayesian methods is to use all available knowledge about a phenomenon to express it jointly, in the form of a single probability distribution, from which all relevant information for the study can be extracted, in a gradual process of reducing uncertainty and learning about the study phenomenon. Classical statistical methods treat each study developed on the same theme independently, ignoring the existence of previous data. In nursing diagnosis research, the application of Bayesian methods can be useful to analyze the probability of diagnoses in specific groups, as well as to analyze rare events (whether the nursing diagnosis or the baseline disease for which the nursing diagnoses are studied). Classical statistical methods depend on relatively large sample sizes, which are hard to get in many of these situations.

Other advantages include: the independence of the sample size used (it is obvious that, in small samples, the credibility intervals calculated will be larger, indicating greater uncertainty on the study phenomenon) and the non-use of common restrictive premises in the application of statistical tests in the frequentist paradigm. In addition, the Bayesian paradigm operates with the uncertainty concept, permitting its application in approaches that aim to discuss the accuracy, sensitivity and specificity of nursing diagnoses, as important themes in the determination of relevant clinical indicators.

Despite the appointed positive factors, it is important to clarify that difficulties and limitations are attached to the use of Bayesian analysis. These limitations include: the need for knowledge on statistical distributions and calculus; the use of specific software with not very user-friendly interfaces; the need to establish a prior distribution based on preliminary knowledge. The latter is not always easy, due to the absence of preliminary information. In some situations, uniform distributions are used or the prior distribution is defined subjectively. This may not adequately represent reality and induce to analysis errors though. Methodological bias for Bayesian studies is the same as for studies using classical statistical analysis. In addition, considerable attention should be paid when information from previous studies is incorporated in a priori distributions. The incorporation of research of low methodological quality can influence final posterior distribution estimates.

Despite the whole notation used in this paper, it should be clarified that our considerations correspond to the fundamentals of Bayesian theory, described specifically for the analysis of nursing diagnoses. Our description is limited to the isolated analysis of a human response, with a view to presenting the bases of the Bayesian paradigm.

More specifically, the methodological process of a Bayesian analysis of nursing diagnoses should include the description of the human response of interest, prior knowledge on that response, according to the prevalence found in preliminary studies, the prior probabilistic model chosen and its justification, the respective initial and final distributions and likelihood of reference, and the numerical conclusions that can be extracted. The presented application example describes each of these steps, until the final probability interval is obtained.



1. Lunney M. Critical thinking and accuracy of nurses' diagnoses. Int J Nurs Terminol Classif. 2003;14(3):96-107.         [ Links ]

2. Mackenzie SJ, Laschinger HKS. Correlates of nursing diagnosis quality in public health nursing. J Adv Nurs. 1995;21(4):800-8.         [ Links ]

3. Thompson C. Clinical decision making in nursing: theoretical perspectives and their relevance to practice - a response to Jean Harbison. J Adv Nurs. 2001;35(1):134-7.         [ Links ]

4. Offredy M. The application of decision making concepts by nurse practitioners in general practice. J Adv Nurs. 1998;28(5):988-1000.         [ Links ]

5. Lunney M. Critical thinking and accuracy of nurses' diagnoses. Part I: risk of low accuracy diagnoses and new views of critical thinking. Rev Esc Enferm USP. 2003; 37(2):17-24.         [ Links ]

6. Dijkstra A, Tiesinga LJ, Plantinga AL, Veltman G, Dassen TWN. Diagnostic accuracy of the Care Dependency Scale. J Adv Nurs. 2005;50(4):410-6.         [ Links ]

7. Thompson C. Clinical experience as evidence in evidence-based practice. Clinical experience as evidence in evidence-based practice. J Adv Nurs. 2003;43(3):230-7.         [ Links ]

8. Harbison J. Clinical decision making in nursing: theoretical perspectives and their relevance to practice. J Adv Nurs. 2001;35(1):126-33.         [ Links ]

9. Penrod J. Refinement of the concept of uncertainty. J Adv Nurs. 2001;34(2):238-45.         [ Links ]

10. Buckingham CD, Adams A. Classifying clinical decision making: a unifying approach. J Adv Nurs. 2000;32(4):981-9.         [ Links ]

11. Dowding D, Thompson C. Measuring the quality of judgement and decision-making in nursing. J Adv Nurs. 2003;44(1):49-57.         [ Links ]

12. Harbinson J. Clinical judgment in the interpretation of evidence: a Bayesian approach. J Clin Nurs. 2006;15(12):1489-97.         [ Links ]

13. Congdon P. Applied bayesian modelling. Chinchester: John Wiley & Sons; 2003.         [ Links ]

14. Iversen GR. Bayesian statistical inference. London: Sage; 1984.         [ Links ]

15. Bernardo JM. Bayesian statistics. In: Viertl R. Encyclopedia of Life Support Systems (EOLSS): probability and statistics. Oxford: UNESCO; 2003. p. 1-45.         [ Links ]

16. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. London: Chapman & Hall/CRC; 2004.         [ Links ]

17. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Chinchester: John Wiley & Sons; 2004.         [ Links ]

18. O'Hagan A. Kendall's advanced theory of statistics: Bayesian inference. London: Arnold; 2004. v. 2B.         [ Links ]

19. Lindley DV. Making decisions. London: John Wiley & Sons; 1992.         [ Links ]

20. Bernardo JM. Reference analysis. In: Dey DK, Rao CR. Handbook of statistics. Amsterdam: Elsevier; 2005. p. 17-90.         [ Links ]

21. Silva VM, Lopes MVO, Araujo TL. Estudio longitudinal de los diagnósticos enfermeros identificados en niños com cardiopatías congénitas. Enferm Clin. 2006; 16(4):176-83.         [ Links ]

22. Silva VM, Lopes MVO, Araujo TL. Diagnósticos de enfermería y problemas colaboradores en niños con cardiopatías congênitas. Rev Mex Enferm Cardiol. 2004; 12(2):50-5.         [ Links ]



Marcos Venícios de Oliveira Lopes
Rua Esperanto, 1055 - Vila União
CEP 60410-620 - Fortaleza, CE, Brazil

Received: 07/03/2011
Approved: 12/22/2011

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License