Introduction

The first mathematical formulation using the Bayesian method is attributed to Thomas
Bayes, a British Presbyterian minister. Very little is known about his personal
history. It is believed that he was born around 1701 in Hertfordshire, England and
died in 1761 in Tunbridge. Many facts about his life are speculation such as the
exact date of his birth and the authorship of a book on Theology entitled
*Divine Benevolence: or an Attempt to Prove That the Principal End of the
Divine Providence and Government is the Happiness of His Creatures*,
that concerned the motive behind God’s actions in making the world. In 1719, he
began his studies of logic and theology at the University of Edinburgh. The only
scientific work published during his lifetime was *The Doctrine of
Fluxions*, in 1736, in which he defended the logical foundation of Isaac
Newton’s calculus. Two years after his death, his friend Richard Price (1723-1791)
presented the Royal Society with a manuscript authored by Thomas Bayes entitled
*An Essay Towards Solving a Problem in the Doctrine of Chances*
^{1}. Price said he found the essay
among Bayes’ papers and in his opinion it “*has great merit and well deserves
to be preserved*” ^{2} (p.
451). The essay offered the first clear solution to a problem of inverse
probability, where Bayes described how we can calculate the probability of the
occurrence of an event given the known probability of a certain condition. This
formula is known as Bayes’ theorem. It is interesting to note that Richard Price
believed that Bayes’ theorem was based on theological arguments and it could prove
the existence of God ^{3}. In 1748,
the Scottish empiricist philosopher David Hume published a book entitled *An
Enquiry Concerning Human Understanding*. In chapter ten of this work
entitled *Of Miracles*, Hume wrote his famous argument against
miracles ^{4}. Today, some authors
claim that Hume’s statements were based on arguments taken from Bayes’s theorem
^{5}^{,}^{6}^{,}^{7}. Despite these philosophical ideas, Bayes’ essay
seemed to have been forgotten until the publication of the book entitled
*Théorie Analytique des Probabilités* by the French mathematician
and astronomer Pierre-Simon Laplace, in 1812. It is believed that Laplace was not
familiar with the work of Thomas Bayes and he independently developed a more formal
version of Bayes’ theorem.

Currently, Bayesian ideas are used in many fields of technology and research, such as
modern computers which use Bayesian filters to classify emails and detect spam ^{8}. Another example of the modern use
of Bayesian ideas is in robots which, based on a Bayesian framework ^{9} and a Bayes network based system,
distinguished terrestrial rocks from meteorites in the first robotic identification
of a meteorite in 2000 in the Elephant Moraine in the Antarctic ^{10}. In addition, NASA’s Mars
Exploration Rover mission has been using Bayesian classification algorithms to study
the physical properties of the surface of Mars ^{11}. Nowadays, Bayesian methods are widely used in many
different fields of research, such as astronomy ^{12}, economics and econometrics ^{13}^{,}^{14}, marketing ^{15}, actuarial science ^{16}, psychological research ^{17}, genetics ^{18}^{,}^{19}, evolutionary biology ^{20}, bioinformatics ^{21}, demography ^{22}, social sciences ^{23}, public health ^{24}, drug development ^{25} and clinical trials ^{26}^{,}^{27}. The use of Bayesian methods in epidemiological
studies has been discussed by several authors ^{28}^{,}^{29}^{,}^{30}^{,}^{31} and Congdon ^{32} claims that the Bayesian approach is very useful for
modeling epidemiological datasets, since they allow the control of possible
confounding influences on disease outcomes and the establishment of causal and
dose-response relationships. In addition, Dunson ^{28} showed that the use of Bayesian techniques in
epidemiological studies is a powerful mechanism for incorporating information from
previous studies and controlling confounding. Appropriate methods for dealing with
interactions between variables and confounding effects are essential for
epidemiological studies, and in this respect Bayesian methods can be very useful.
Bayesian methods represent a totally different way of thinking about research
methods where the researcher’s previous knowledge and experience have an important
effect on inference and decision-making.

The traditional approach to statistical inference is the frequentist (or classical) technique, where results are interpreted in terms of the frequency of occurrence of an event observed in a hypothetically large number of repetitions of the experiment. Frequentist inferences are based only on observational data, while Bayesian inference assumes that prior knowledge can be formally incorporated into the analytical process. We can therefore say that the Bayesian research method is based both on an empirical world represented by the sample data and on human reasoning represented by the accumulated experience of the researcher.

In the present article, we present an overview of the Bayesian approach together with a brief description of the Bayesian statistical inference procedure and comparison with the standard frequentist approach. We also discuss the advantages of the Bayesian approach over the traditional research method applied to the analysis of epidemiological data and discuss some areas of epidemiology which should benefit from the use of Bayesian methods in coming years.

Are we in the Bayesian era?

In 1996, David Moore published an article ^{33} that discussed the possibility of teaching Bayesian
inference on a first statistics course for students from different backgrounds. He
argued that Bayesian methods were rarely used in practice and teaching them would
deprive students of instruction about more common statistical methods. It is
possible that this statement was based on limitations caused by the time needed for
software and hardware to analyze data using a Bayesian approach. Major advances in
software and hardware in the last 20 years have been one of the factors that has led
to a sharp increase in the use of Bayesian methods. To obtain an idea of the current
use of Bayesian methods in health research, a search was made in PubMed using the
keyword “Bayesian”. The annual number of articles is presented graphically in Figure 1 which shows that the first article
using the term “Bayesian” indexed in PubMed was published in 1963 ^{34}. After this first publication, we
observe a very modest increase in the number of articles up to the middle of the
1980s, after which a large increase can be observed. It is important to remember
that portable personal computers only became popular in the middle of the 1980s,
significantly contributing to the use of Bayesian methods, since the approach is
usually depends on computational algorithms. A large increase in the number of
published articles using the term Bayesian can be observed toward the end of
20^{th} century. It is possible that this increase was due to the
emergence of new software adapted to Bayesian analysis, such as the free software
WinBUGS ^{35}. This software uses
simulation methods, such as the popular Markov Chain Monte Carlo (MCMC) methods
^{36}, and was possibly the most
important computational advances to have popularized the use of Bayesian
methodology. The first version of WinBUGS for Windows was made available in 1997
^{37}. Today, OpenBUGS is the
open-source version of WinBUGS and can be freely downloaded from the project website
(http://www.openbugs.info/w/Downloads).

In 2010, 2.56 in every 1,000 articles indexed in PubMed contained the term Bayesian
(Figure 1), showing the growing use of
Bayesian methods in health research since the publication of Moore’s article ^{29} and suggesting that Bayesian
statistics is actually an important issue to students who are starting their studies
to become researchs. The advantages of the use of Bayesian methods in specific
fields of knowledge, such as genetics ^{18}, oncology ^{38} and parasitology ^{39}, have therefore been discussed in a number of research
articles available in the literature.

One of the reasons for the widespread use of Bayesian methods may be related to the ease with which statistical inferences can be made even with complex problems. It is therefore expected that the use of Bayesian methods will continue to increase in response to the demands of ever more complex problems in the health field.

A practical example: estimating disease prevalence

In order to illustrate Bayesian inference procedures, let us consider a simple
example in which we estimate the prevalence θ of a disease among the inhabitants of
a given community. A parameter is defined as an unknown numerical characteristic of
a population. Prevalence θ is therefore a parameter and may be estimated using a
frequentist or Bayesian approach. First, let us describe how this is done using the
frequentist approach. A sample of size *n* is represented by a
probability function, defined as the likelihood function and denoted by
*f*(** x**|θ). In the frequentist
approach, inference is based only on the likelihood function.

*X*

_{i}is a random binary variable which assumes the value of 1 if the

*i*-th individual has the disease of interest and the value of 0 if the

*i*-th individual does not have the disease: (

*i*= 1,…,

*n*). Thus, the probability of the

*i*-th individual having the disease is

*P*(

*X*

_{i}= 1) = θ, and the probability of the individual not having the disease is

*P*(

*X*

_{i}= 0) = 1 – θ, where 0≤θ≤1. In this case, we say that

*X*

_{i}follows a Bernoulli distribution with success probability θ, and its probability function is given by

*P(X*

_{i}

*= x*

_{i}

*) =*θ

^{xi}

*(1 –*θ

*)*

^{1 –xi}, where

*x*

_{i}assumes the value 0 or 1. Assuming that

*X*

_{1},

*X*

_{2}, …,

*X*

_{n}are independent random variables, that is, assuming that an individual having the disease does not affect the probability of another individual having the disease, the likelihood function is given by.

The maximum-likelihood estimation (MLE) procedure is a frequentist method commonly
used to estimate the parameters of a statistical model ^{40}^{,}^{41}. Using the MLE method the estimate of a parameter θ is
given by the value of this parameter in the parameter space (set of all possible
values of the parameter) that maximizes the likelihood function
*f*(*x**|*θ). For
this purpose, we can use differential calculus tools to obtain an estimator
of θ. In practice, it is often more convenient to maximize
the logarithm of the likelihood function, also called the log-likelihood function.
If we set the first derivative of the log-likelihood function equal to zero, the
maximum likelihood estimator of θ is given
by

This is a well-known expression which can be found in many widely used epidemiology textbooks. Inference is then based on a hypothetical series of data sets collected under identical conditions. For example, a 95% confidence interval is a range of values calculated from the sample observations, with 95% certainty that all possible random samples drawn from the same population using the same sampling scheme would generate intervals containing the true value of the parameter. However, a 95% confidence interval does not mean that there is a 95% probability that the calculated interval contains the true value of the parameter. Although this interpretation is quite intuitive, it is not valid since the frequentist method cannot assign probabilities to any particular parameter.

The frequentist approach assumes that the parameter of interest is a fixed quantity, while in the Bayesian approach parameter uncertainty is represented by a probability distribution. From a statistical viewpoint, this is perhaps the most striking difference between the traditional frequentist approach and the Bayesian approach and is cause of much controversy, since frequentist statisticians do not accept the parameters to be represented by random variables.

In Bayesian analysis, a prior probability distribution for all parameters in the
statistical model is necessary. On observing that the prevalence of the disease is
in a limited range (0,1), a plausible prior probability distribution is assigned to
a parameter θ given by a beta distribution ^{42}, a flexible probability distribution which can take
many forms depending on the values of *a* and *b*. In
this case, the prior distribution *f* (θ)for θ has a probability
density function given by

where *a* and *b* are known values, and
*B*(*a,b*) is the beta distribution in which
*a* and *b* are the hyperparameters of the prior
distribution. Hyperparameters may be chosen by a panel of experts or by using
results from previous studies. For example, let’s say that an experienced
epidemiologist believes that the prevalence of the disease in a specified population
is around 15% and is almost certain that prevalence is no less than 2% and no
greater than 30%. Based on this information, we now need to find the values of the
constants *a* and *b*. One possible strategy is to
approximate the mode of the beta distribution to 15% and its standard deviation to
one quarter of the distance between the limits 2% and 30% (for further details see
Browne 2001^{43}) thus giving
*a* = 4.96 and *b* = 23.45. The graph in Figure 2 shows the probability density function
of the beta distribution with parameter values of 4.96 and 23.45 mathematically
representing the prior information given by the epidemiologist. It can be observed
that the maximum value of the probability density function is 15% (the mode). In
addition, it can be seen that the probability of prevalence being higher than 30% or
lower than 2% is low. This process is called prior probability elicitation ^{44}.

The likelihood function plays a key role in statistical inference in both frequentist
and Bayesian approaches. The Bayes’ theorem says that the distribution of θ given
the data (named posterior distribution) is proportional to the product of the prior
distribution *f *(θ) and likelihood function *f
*(*x**|*θ). Bayes’
formula establishes that .

where *k* is a constant value known as the normalizing constant. The
posterior distribution for θ also follows a beta distribution, since the expression
for *f *(θ*|x*) given above is in the form of a beta
distribution. When the posterior distribution *f
*(θ*|x*) is in the same family as the prior distribution
*f *(θ), we say that *f *(θ) is a conjugate prior
distribution for θ.

Let us suppose a sample of size n = 100 individuals from the population of interest, of which 22 individuals have the disease in interest. The maximum likelihood estimate for θ is given by 22/100 = 22%. Considering the Bayesian approach, the posterior distribution for θ is proportional to

, since *a* = 4.96, *b* = 23.45,
and ** n** = 100. Thus,

*f*(θ

*|*

**) follows a beta distribution with parameters 26.96 and 101.45.**

*x*Figure 3 compares the prior and posterior
distributions for θ. We note that the curve that represents the posterior
distribution is the lower dispersion curve, suggesting that the posterior
distribution provides more information about θ than the prior distribution.
Considering that the mean of a random variable that follows a beta distribution with
parameters *a* and *b* is given by
*a*/(*a*+*b*), the Bayesian
estimate of disease prevalence is given by 26.96/(26.96+101.45) = 21%.

The graph in Figure 4 shows the posterior distribution for θ, where the gray area corresponds to 95% of the total area under the curve. This area represents the 95% credible interval, which in this case is within a range of 14.4 to 28.4%. The credible interval is the Bayesian equivalent of the frequentist confidence interval, and we can interpret that there is a 95% probability that the true prevalence θ lies within this range.

The use of noninformative prior distribution

The use of a noninformative prior distribution is suggested when there is total
ignorance about the parameter of interest. When using noninformative prior
distribution for the parameter θ the Bayesian estimates tend to be close to the
corresponding maximum-likelihood estimates, since in this case *f
*(θ) has minimal impact on the posterior distribution *f
*(θ*|*** x**). There are
different techniques for constructing noninformative prior distributions, such as
the Jeffreys prior

^{45}based on the so-called Fisher information, a concept used in the theory of maximum-likelihood estimation. Using the problem presented in the previous section, the Jeffrey prior is defined in terms of a beta distribution with parameters 0.5 and 0.5 (see, for example, Box & Tiao

^{46}). In this case, the posterior distribution

*f*(θ

*|*

**) follows a beta distribution with parameters 22.5 and 78.5, and the Bayes estimate of prevalence is 22.5/(22.5+78.5) = 22.3%. Noninformative prior distributions are useful when we have no knowledge about the parameter of interest or when a more objective analysis is required. Bayesian analysis using noninformative prior distribution has a number of advantages over maximum-likelihood estimation in situations where the likelihood function is particularly complex and traditional optimization methods are not well suited to such problems.**

*x*Table 1 illustrates frequentist and Bayesian estimates (posterior means) of prevalence based on different sample sizes and choices of prior distribution for θ. For all assumed sample sizes we fixed . We also assigned beta (4.96, 23.45) as an informative prior distribution for θ, beta (0.5, 0.5) as a noninformative prior distribution and beta (10,10) as an example of an inadequate prior distribution based on an implausible expert opinion. Bayes estimates based on the noninformative prior distribution are similar to the frequentist estimates. As sample size increases, we can observe that frequentist and Bayes estimates become more similar, even in the case of the inadequate prior distribution for θ. This occurs because in large samples the contribution of the likelihood function to the posterior distribution is relatively greater in relation to the adopted prior distribution for the parameter.

All calculations and simulations were carried out using the R software (The R Foundation for Statistical Computing, Vienna, Austria; http://www.r-project.org).

Advantages of Bayesian methods

Several authors claim that the main advantage of the Bayesian approach over the
frequentist method is that it allows the incorporation of prior knowledge by
specifying appropriate prior probabilities ^{47}^{,}^{48}. However, the advantages of Bayesian methods are not
limited to the possibility of incorporating out-of-sample information into the
analyses. For example, Bayesian methods are especially useful for statistical
inference of complex models which present significant difficulties for frequentist
methods. Calculating the maximum of very complex likelihood functions can be a
difficult task in practice, despite the advances in computer software and hardware
in recent years. In such situations, the frequentist approach usually involves
numerical tools, such as the traditional Newton-Raphson method. However, convergence
problems may occur or solutions may be highly dependent on initial values. Bayesian
methods can overcome this problem by using the MCMC ^{49}^{,}^{50} methods, that allow samples to be simulated using the
parameters of interest. In this approach, inference is therefore based on the
sample, remembering that the Bayesian approach treats the parameters as random
variables. In some special situations, this procedure can be simplified by the use
of a Bayesian technique based on a procedure called data augmentation introduced by
Tanner & Wong ^{51}. This
procedure “augments” the observed data to simplify the likelihood function.

Another important aspect of the frequentist inference approach concerns the
identifiability of the parameters of a given model. The problem of identifiability
occurs when there are more parameters than degrees of freedom. In such situations,
parameter estimation based on the frequentist approach is a difficult task. Degrees
of freedom can be understood as “*the number of independent units of
information in a sample relevant to the estimation of a parameter or calculation
of a statistic*” ^{52}
(p. 118). A practical example is the assessment of new diagnostic tests which have
not achieved the gold standard. Joseph et al. ^{53} showed that in these types of situations, where there
are more parameters (sensitivity, specificity and disease prevalence) than
information from the data, Bayesian methods are able to provide estimates for these
measures.

Although the literature includes studies that used Bayesian hypothesis testing, the
main focus of the Bayesian method is estimating parameters and not hypothesis
testing. For example, given two hypotheses H_{1} and H_{2}, a
Bayesian hypothesis test compares the probability of the observed data D given
H_{1}, denoted by P(D|H_{1}), and the probability of the
observed data D given H_{2}, denoted by P(D|H_{2}). The ratio BF =
P(D|H_{1})/P(D|H_{2}) is the Bayes factor ^{54}, which quantifies the evidence from data for
H_{1} in relation to H_{2}. It should be noted that this
procedure is different from the traditional null hypothesis significance testing.
While the results of frequentist hypothesis tests are usually expressed as p-values,
the results of Bayesian hypothesis tests are expressed as Bayes factors. P-values
are difficult to interpret and are regularly misinterpreted by health researchers,
while Bayes factors are more easy to interpret.

Recent trends in Bayesian analysis

The following is non-exhaustive list of areas of epidemiological research which should benefit from the use of Bayesian methods over the coming years.

Spatiotemporal modeling

Ecological studies involve the description of the geographical distribution of a
disease or an event of interest and associated factors. In this context, spatial
autoregressive models have been extensively used in data analysis and a popular
modeling approach has been through the conditionally autoregressive (CAR)
distribution and their generalizations. These models are relatively flexible and
can accommodate different structures of spatial correlation and longitudinal
data, as well as the presence of covariates. The estimation of the parameters of
these models based on frequentist inference methods can be a difficult task due
to the complexity of the likelihood function, and Bayesian methods provide a
convenient alternative to deal with this model structure. This type of modeling
is facilitated by the use of the software OpenBUGS, that allows sample
simulation for CAR distribution and multivariate extension ^{55}^{,}^{56}. In a broader sense, these spatiotemporal models
^{57} can be classified as a
type of hierarchical model. Multilevel or hierarchical models are useful for the
analysis of data structured in groups, which is common in epidemiological
studies.

Models based on distributions rather than normal curve

The use of models based on normal distribution is quite common in epidemiological
studies. There is a belief that nonparametric hypothesis testing should be used
when the normality assumption is not satisfied, but this is not necessarily true
due to the existence of generalized linear models (GLM), a very general class of
statistical models that includes many probability distributions. In addition,
traditional nonparametric tests only provide p-values, while measures of the
size of the association between groups are essential for epidemiological
research. The relationship between dependent discrete variables and explanatory
variables can be explored by using models based on Poisson, binomial, negative
binomial or beta-binomial distributions, depending on the amount of data
dispersion. Count data with excess zeros ^{58} and truncated data ^{59} are also common in epidemiological studies and
specific regression models are required to deal with this. Bayesian methods can
be very useful in these modeling applications, since they enable us to estimate
parameters and related measures of association in complex models or where
asymptotic assumptions are not appropriate due to sparse data or small sample
sizes.

Models for survival data based on more complex distribution

In epidemiological studies, parametric survival models are usually based on the
Weibull, lognormal or gamma distributions. Alternative distributions for
time-to-event data have been used by studies mentioned in the literature in
recent years, allowing the addition of a parameter representing the proportion
of individuals which are “immune” to the event of interest ^{60}. These distributions are extensions of usual
distributions including a greater number of unknown parameters ^{61}^{,}^{62}. Parameter estimation in
survival models based on these distributions can be challenging, especially when
covariates are involved, since asymptotic properties cannot be assured. Bayesian
analysis could be a promising alternative for this type of modeling because the
use of MCMC methods are capable of dealing with the complexity of the resulting
likelihood function.

Multivariate copula models

Copula functions ^{63} are tools
used to construct and simulate multivariate distributions. For example, copula
functions can be used to study the joint distribution of the successive survival
times ^{64}, multiple dependent
diagnostic tests ^{65} or the
association of risk factors for two or more diseases simultaneously. Bayesian
methods can accommodate different copula functions and may therefore be useful
for many epidemiological investigations that use multivariate data.

Concluding remarks

Since Bayesian methods allows the incorporation of relevant prior knowledge or beliefs into the analysis, the researcher is no longer just an observer in the research process and his or her experience becomes an active component to obtain inferences of interest. This in itself is often seen as a controversial aspect of Bayesianism, since the traditional scientific method relies on a positivist approach and has been proposed to avoid subjective analysis. The Bayesian method offers a different way of thinking about research and we believe that it can make a valuable contribution to the development of knowledge in a number of fields apart from epidemiology.

From a statistical viewpoint, we believe that the major advantage of the Bayesian
approach is its extreme flexibility. The availability of MCMC methods allows the
analysis of a wide range of statistical models which could be applied to
epidemiologic research, such as hierarchical models, longitudinal models and more
complex models applied to specific design studies ^{66}^{,}^{67} or unusual data structures.

Currently, good Bayesian analysis software is available, such as OpenBugs, SAS and several R software libraries. An important advantage of OpenBugs and R programs is that they are freely available on the internet. However, the use of these programs requires some knowledge of programming language. Therefore, researchers who are not proficient in computer programming may have some difficulties with Bayesian modeling, and this is an obstacle to popularizing Bayesian methods in epidemiological research. This situation may also be aggravated by a lack of professional statistical support in health research institutions.

Despite these potential difficulties, this study observed a sharp increase in the number of studies using Bayesian methods and this trend looks set to continue. It can therefore be concluded that epidemiologists, clinicians and health professions students interested in a research career should receive appropriate training in Bayesian methods to be able to deal with more complex problems.

Contributors

E. Z. Martinez and J. A. Achcar contributed to project design, the literature review and writing and revision of this article.