INTRODUCTION

Case-control studies are commonly utilized, especially for studying rare diseases^{1}
^{-}
^{3}. Both case identification and control selection
should be carried out with the greatest possible rigor. Lack of rigor could lead to
systematic errors and consequently to invalid results^{3}. Similarly, if the exposure, the disease or both are misclassified, the
association measures, the odds ratio (OR) and the conclusions will be biased^{1}. In any epidemiological study, the inappropriate
classification of the exposure or of the disease is known as misclassification error,
which can be divided into two types: non-differential misclassification error (NDME) or
differential misclassification error (DME). This type of error is usually found in
studies that investigate socially unacceptable behaviour or behaviours considered
private that can generate shame or stigma^{4}, such
as sexual behaviour or the use of psychoactive substances^{5}. It is known that NDME biases the risk estimate (typically represented by
an odds ratio in case-control studies) towards the null hypothesis^{2}
^{,}
^{6}
^{-}
^{11}. The behavioural trends of NDME have been
extensively described, but DME trends are not yet clear^{12}. Consider a 2 x 2 contingency table in any type of epidemiological study
that represents the results observed among non-diseased and diseased individuals as a
function of the exposure variable, where* a *and* b
*represent, respectively, the number of diseased and non-diseased exposed
individuals and *c *and *d* represent the number of
diseased and non-diseased unexposed individuals, respectively. Exposure
misclassification error occurs when the diseased group and the non-diseased group are
considered to be exposed although they were not and unexposed although they were^{6}
^{, }
^{12}. A similar situation would occur if error
arose in the classification of the disease. Equations 1 to 4 show how misclassification
error affects the frequencies in the 2 x 2 table. These equations have been derived by
us using equation 9 from the Vogel and Geffeller paper^{13}.

*a' = aSe*
*
_{D}
*

*+ c(1 - Sp*

_{D}*)*Eq. (1)

*b' = aSe*
*
_{N}
*

*+ c(1 - Sp*

_{N}*)*Eq. (2)

*c' = (1-Se*
*
_{D}
*

*)+cSp*

*Eq. (3)*

_{D}
*d' = b(1 - Se*
*
_{N}
*

*) + dSp*

*Eq. (4)*

_{N}In these equations, *a* and *a'* indicate, respectively,
the true and observed numbers of diseased persons exposed, *b* and
*b'* the true and observed numbers of non-diseased exposed,
*c* and *c'* the true and observed numbers of diseased
unexposed, and *d* and *d'* the true and observed numbers
of non-diseased unexposed. *Se*
_{D} and *Se*
_{N} represent the sensitivity of classifying those truly exposed among the
diseased (*Se*
_{D} ) and non-diseased *(Se*
_{N}), and *Sp*
_{D} and *Sp*
_{N} are the specificities for classifying those truly unexposed among the
diseased (*Sp*
_{D}) and non-diseased (*Sp*
_{N}). The reasoning behind these equations is that the group of diseased
individuals will include a percentage of truly exposed subjects (*a*) who
will be classified as such by a certain instrument (laboratory test, survey, etc.). This
percentage corresponds to the sensitivity of the instrument for classifying subjects as
exposed (*Se*
_{D}). In addition, some of the individuals who were truly not exposed
(*c*) will be classified by the instrument as exposed. This percentage
represents the instrument's false positive rate (*1 - Sp*
_{D}). Therefore, among diseased individuals, the number of subjects classified
by the instrument as exposed (*1- Sp*
_{D}) will be equal to the fraction of individuals correctly classified,
*aSe*
_{D}, in addition to the fraction of individuals incorrectly classified,
*c*(*1 - Sp*
_{D}). Similarly, the number of diseased subjects classified by the instrument
as unexposed (*c*') is equal to the fraction of individuals correctly
classified as such, *cSp*
_{D} , plus the fraction of individuals incorrectly classified,*
a*(*1 - Se*
_{D}). Likewise, the number of subjects classified as exposed and unexposed by
an instrument in a non-diseased population (*b'* and *d'*)
is determined by the sensitivity and specificity of the instrument in that population
(*Se*
_{N} and *Sp*
_{D}). The subjects classified as exposed (*b'*) will correspond
to the fraction of individuals correctly classified, *bSe*
_{N}, plus the fraction of individuals incorrectly classified
*d*(*1 - Sp*
_{N}). Similarly, the subjects classified as unexposed (*d'*)
will equal the fraction of individuals correctly classified, *dSp*
_{N}, added to the fraction of individuals incorrectly classified,
*b*(*1 - Se*
_{N}).

Non-differential misclassification error occurs when *Se*
_{D} = *Se*
_{N} and *Sp*
_{D }= *Sp*
_{N}
*,* otherwise, the bias is considered a differential misclassification
error^{12}. The true odds ratio is
*OR*= *ad/cb*, and the observed or estimated odds ratio
is *OR'* = *a'd'/c'b'*. If the sensitivities and
specificities in non-diseased and diseased individuals are equal to 100%, the true odds
ratio is equal to the observed (*OR* = *OR'*), and if they
are equal to 50%, the observed odds ratio will be equal to (*OR'* = 1). A
similar situation would occur with other risk measures employed in different
epidemiological designs. Using equations 5 to 8, it is possible to use the true table
frequencies to replace the corresponding sensitivities and specificities for the
following formulas derived (also by the authors) from the equations described above:

If the sensitivities and specificities of the instrument employed for classifying exposed/unexposed individuals (non-diseased as well as diseased) are known, then researchers can correct this type of error. However, this is not always possible. Therefore, in this article, the effect of misclassification error (differential and non-differential) on odds ratio estimates is simulated using a case-control design. This effect is simulated as a function of the sensitivity and specificity of the exposure classification given different prevalence rates of the exposure among the controls and different true ORs to describe how misclassification error occurs and to identify possible trends. These trends may guide the discussion of results in this type of studies when there are deficiencies in the classification instruments.

MATERIALS AND METHODS

OUTCOME

The OR was utilized as the measure of risk in our case-control study, and the
observed value was calculated using a 2×2 table. To study the effect of
misclassification error on the estimation of the OR, we used different scenarios with
variations in the sensitivity and specificity of the exposure classification for
cases and controls, the prevalence of the exposure among the controls and the OR. The
effect was measured according to the bias produced by varying these parameters. By
definition, bias is the difference between the estimated OR and the true OR^{14}. Relative bias (expressed as a percentage) is
the quotient between this difference and the true OR (expressed in hundreds).
Negative bias values indicate underestimates of the OR, and positive bias values
indicate overestimates. Likewise, bias values close to zero indicate the absence of
bias.

STUDY DESIGN

Simulation studies employed in various areas of research are very useful to
understand the behaviour of certain phenomena under different virtual scenarios
prompted by researchers through some specialized software. In statistical robustness
studies these are very common to observe the behaviour of an estimator under
different scenarios that could occur in reality. Given the similarity between the
simulation studies and experimental studies, this last approach is used to quantify
the effect of misclassification in case-control studies^{15}. For this, we have considered six factors:

the sensitivity of the exposure classification among the cases;

the specificity of the exposure classification among the cases;

the sensitivity of the exposure classification among the controls;

the specificity of the exposure classification among the controls;

the prevalence of the exposure among the controls; and

the odds ratio.

There were five levels for factors 1 - 4 (0, 25, 50, 75 and 100%) and two levels for
factors 5 - 6 (5 and 30% for the prevalence and 2 and 7 for the OR). A total of 2,500
scenarios were generated (4^{5} x 2^{2)}. One hundred of these
scenarios comprised the NDME analysis because the factors 1 and 3 levels as well as
the factors 2 and 4 levels were the same. Therefore, the number of factors considered
in this analysis was reduced to four. The remainder of scenarios (n = 2,400) made up
the DME analysis, conserving the six initially stated factors.

STATISTICAL ANALYSIS

Interaction graphics were used to study relative bias as a function of the factors that varied, estimating the median bias for each group of combinations. Using linear models, the effect of the varying factors (i.e., the explanatory variables) on relative bias (i.e., the outcome variable) was studied, using a 0.05 level of statistical significance. Because positive relative bias occurred only with DME, which presented a high level of variability, the DME analyses were stratified according to the sign of the bias. Because of the high variability, a natural logarithmic transformation was used for the DME outcome in the linear models. Given the complexity of the DME results, in a subsequent analysis of this type of misclassification error, we considered OR estimates with absolute relative bias less than 15% as having no bias or moderate bias. This cut-off was the minimum value close to 0% which yielded an adequate sample size to do comparisons. For this analysis, we created a categorical variable with three levels:

Negative bias or underestimation of the OR (relative bias values smaller than -15%);

Bias absent (bias values greater than -15% but less than 15%); and

Positive bias or overestimation of the OR (bias values greater than 15%).

Spearman's correlation coefficient was used to assess the relationships between the different levels of the DME analysis factors and positive, absent or negative bias. P-values less than or equal to 0.05 were considered significant. The R statistical package (R CoreTeam (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/) was utilized for generating the scenarios, producing the results of the simulation and conducting the data analyses. R scripts are available on request from the corresponding author.

RESULTS

NON-DIFFERENTIAL MISCLASSIFICATION ERROR

The bias attributable to NDME was negative in the whole 100 scenarios and very negative in many cases, indicating that in the presence of NDME in a case-control study, the risk is always underestimated, even beyond the null hypothesis (OR = 1). In extreme cases of NDME, the OR becomes the inverse of the true odds ratio if the sensitivity and specificity of the exposure classification tend towards 0%, it approaches one if the sensitivity and specificity of the exposure classification tend towards 50% and it tends towards the true OR if the sensitivity and specificity of the exposure classification tend towards 100% (Figures 1A and 1B). On the other hand, when the specificity of the exposure classification is 100% and the sensitivity is 0%, the OR is underestimated, but never drops below the value that corresponds with the null hypothesis (OR = 1). If the specificity tends towards 0%, even if the sensitivity is high, the OR is underestimated below the level of the null hypothesis (Figures 1A and 1B). According to the simulations we conducted, when the specificity level is higher than the sensitivity level, an unbiased estimate of the OR is produced (p-value for the interaction between sensitivity and specificity = 0.014).

When the exposure prevalence among the controls was high and the sensitivity of the
exposure classification was 100%, the OR estimate tended to be unbiased in the
presence of NDME. When the sensitivity was lower, the exposure prevalence showed no
effect (p-value for the interaction between sensitivity and prevalence = 0.20) (Figures 1C and 1D). The opposite effect was observed when we evaluated the relative bias
by combining the specificity of the exposure classification with the prevalence of
the exposure among the controls (p-value for interaction between specificity and
prevalence = 0.52) (Figures 1E and 1F). Finally, we observed less bias in the
estimates of moderate odds ratios. The relative bias increased when the true OR was
greater, independent of the sensitivity and specificity (p-values for interaction
between sensitivity *versus* OR and specificity
*versus* OR = 0.88 and 0.13, respectively) (Figures 1G and 1H).

DIFFERENTIAL MISCLASSIFICATION ERROR

Of the 2,400 scenarios in the ECD analysis, 767 (32%) overestimated (i.e., positive
bias) and 1,633 (68%) underestimated (i.e., negative bias) the true odds ratio. As
opposed to the NDME analysis, the results of the DME are complex, not only because of
the number of factors involved in the simulations but also because they do not
produce a trend that clearly describes the effect of DME in relation to the
combinations of factors. Therefore, the DME analysis is presented in a discriminatory
manner according to the type of bias: negative or positive. Figures 2 to 4 correspond to the medians of the relative biases
for different combinations of the six factors included in this analysis. High levels
of sensitivity and specificity of the exposure classification among the cases
(*Se*
_{Ca} and *Sp*
_{Ca}, respectively) reduce the bias, but they reduce positive bias more than
negative bias (Figures 2A and 2B). A similar effect is observed for the
sensitivity and specificity of the exposure classification among the controls
(*Se*
_{Co} and *Sp*
_{Co}, respectively), although the effect is less clear for
*Se*
_{Co} (Figures 3A and 3B). With regard to the prevalence of the exposure
among the controls, the bias is lower when the exposure prevalence is high,
independent of the sensitivity or specificity of the exposure classification, both
for the cases and for the controls (Figures 2C
to 2F and 3C to 3F). Regarding the specificity,
the effect was the opposite of that observed for NDME. When the levels of sensitivity
and specificity were combined with the levels of the true ORs for the cases and the
controls, no clear pattern of bias was observed (Figures 2G, 2H, 3G and 3H). After combining
*Se*
_{Ca}, *Sp*
_{Ca}, *Se*
_{Co} and *Sp*
_{Co}, we observed that the bias decreased with high levels of specificity
more than with sensitivity in both the cases and the controls. However, these trends
were not very clear (Figures 4A to 4H).

Table 1 presents the negative, absent or
positive bias associated with DME according to each factor. Negative bias percentages
decreased with high levels of *Se*
_{Ca} and *Sp*
_{Co} (Spearman's p-values = 0.017). In the other cases, the negative bias
percentage increased. The complete opposite occurred with positive bias. The absence
of bias was invariably observed for varying levels of sensitivity and specificity of
the exposure classification among the cases and the controls (Spearman's p-values
> 0.05). A high exposure prevalence among the controls was correlated with
increased percentages of absent and negative bias and with a decreased percentage of
positive bias, but these were not significant (Spearman's p-values = 1). High ORs
were associated with an increase in negative bias and a decrease in positive and
absent bias, but these results were not significant (Spearman's p-values = 1).

Factor | Negative bias | Lack of bias | Positive bias | |||
---|---|---|---|---|---|---|

n | % | n | % | n | % | |

Sensitivity (Ca) | ||||||

0% | 381 | 79.4 | 15 | 3.1 | 84 | 17.5 |

25% | 363 | 75.6 | 13 | 2.7 | 104 | 21.7 |

50% | 333 | 69.4 | 19 | 4.0 | 128 | 26.7 |

75% | 294 | 61.3 | 20 | 4.2 | 166 | 34.6 |

100% | 221 | 46.0 | 16 | 3.3 | 243 | 50.6 |

r/p-value | -1/0.017 | 0.6/0.35 | 1/0.017 | |||

Specificity (Ca) | ||||||

0% | 175 | 36.5 | 19 | 4.0 | 286 | 59.6 |

25% | 268 | 55.8 | 28 | 5.8 | 184 | 38.3 |

50% | 334 | 69.6 | 18 | 3.8 | 128 | 26.7 |

75% | 383 | 79.8 | 13 | 2.7 | 84 | 17.5 |

100% | 432 | 90.0 | 5 | 1.0 | 43 | 9.0 |

r/ p-value | 1/0.017 | -0.9/0.083 | -1/0.017 | |||

Sensitivity (Co) | ||||||

0% | 273 | 56.9 | 17 | 3.5 | 190 | 39.6 |

25% | 299 | 62.3 | 17 | 3.5 | 164 | 34.2 |

50% | 319 | 66.5 | 18 | 3.8 | 143 | 29.8 |

75% | 340 | 70.8 | 16 | 3.3 | 124 | 25.8 |

100% | 361 | 75.2 | 15 | 3.1 | 104 | 21.7 |

r/ p-value | 1/0.017 | -0.67/0.219 | -1/0.017 | |||

Specificity (Co) | ||||||

0% | 458 | 95.4 | 4 | 0.8 | 18 | 3.8 |

25% | 421 | 87.7 | 7 | 1.5 | 52 | 10.8 |

50% | 359 | 74.8 | 25 | 5.2 | 96 | 20.0 |

75% | 259 | 54.0 | 34 | 7.1 | 187 | 39.0 |

100% | 95 | 19.8 | 13 | 2.7 | 372 | 77.5 |

r/ p-value | -1/0.017 | 0.7/0.233 | 1/0.017 | |||

Prevalence | ||||||

5% | 766 | 63.8 | 23 | 1.9 | 411 | 34.3 |

30% | 826 | 68.8 | 60 | 5.0 | 314 | 26.2 |

r/ p-value | 1/1 | 1/1 | -1/1 | |||

Odds ratio | ||||||

2 | 692 | 57.7 | 47 | 3.9 | 461 | 38.4 |

7 | 900 | 75.0 | 36 | 3.0 | 264 | 22.0 |

r/ p-value | 1/1 | 1/1 | -1/1 |

Ca: cases. Co: controls. r: Spearman correlation coefficient.

DISCUSSION

In research in general and in epidemiological research in particular, different forms of
bias are frequently present and lead to findings that do not reflect precisely what is
occurring. Biases are the result of deficiencies in the determination of the sample
size, representativeness, data collection, definitions of relevant variables or design.
The best-known biases in epidemiological research are selection, survival, loss to
follow-up, detection, recall, inclusion/exclusion, confounding and
misclassification^{16}.

Misclassification bias refers to errors made when classifying an individual into a
certain group, such as diseased individuals or individuals exposed to a certain factor.
It occurs when non-validated instruments are used and the sensitivity and specificity of
the classification method is unknown. Misclassification errors act in a particular
direction, positive or negative^{17},
underestimating or overestimating the true value of the association measure, such as the
relative risk (RR) in cohort studies, the hazard ratio (HR) in longitudinal studies and
the odds ratio (OR) in case-control studies. Misclassification error can be differential
or non-differential. When DME occurs, the sensitivity and/or the specificity of the
classification of the subjects are different among comparison groups^{3}. NDME results in a decrease in statistical power
(i.e., the ability of a test to show an association when one really exists), biasing the
value of the OR towards the null^{2}
^{,}
^{6}
^{-}
^{11}. DME biases the association either towards or
away from the null hypothesis^{12}. The majority of
the time, it is impossible to predict the direction of the bias because of the complex
framework involving differences in sensitivity, specificity and exposure prevalence
between the cases and the controls. Although Chyou^{18} claims to have found patterns of the effect of DME, such patterns are
described only for certain pre-established scenarios and not for a general scenario as
our study intended to. In fact it is one of the limitations Chyou refers in his
paper^{18}. Of course, one aspect to highlight in
the description of DME is that there is a strong influence of the prevalence of exposure
as well as the estimated effect size. In practice, DME often arises when surveys are
administered because it is possible that cases have better memory of the exposure than
the controls, which have less motivation to remember. These surveys do not always prompt
completely true responses, not because an individual intended to lie but because the
individual cannot recall their exposure history^{3}.
The immediate consequence is that when the exposure classification is determined, false
positives or false negatives arise^{19}.

This simulation study analyzed the bias in an odds ratio estimate generated in the
presence of DME and NDME in a case-control design for different levels of sensitivity
(*Se*) and specificity (*Sp*) of the exposure
classification (in the cases and the controls), the exposure prevalence among the
controls and the true OR. The NDME analysis was simpler than the DME analysis because it
was possible to observe easily interpretable trends. In the simulation, we found that
100% of the bias produced by the NDME in estimating the OR is negative, weakening the
association and driving the estimation of the risk measure towards the null hypothesis.
In addition, we found that the unbiased OR estimates in the presence of NDME are
achieved faster at higher levels of specificity compared with sensitivity. This could be
a consequence in the selection of strategies to classify individuals, preferring those
with a greater chance of exclude the presence of the studied event (i.e. more specific).
DME biases the association positively or negatively, driving the OR estimate towards or
away from the null hypothesis with great uncertainty. This situation, in which the
magnitude of the association is under- or overestimated, leads researchers to affirm
associations that do not truly exist or, on the contrary, to affirm their non-existence
when they are actually present. This could have serious implications for the generation
of new knowledge to reporting opposite effects devoid of plausibility. In fact, to give
an example, this could be a plausible explanation for the protective effect of cigarette
use in breast cancer in certain studies^{19}.
Therefore it should be ideal that researchers question how cigarette use has been
measured and then implement better approaches such as measurements of metabolites in
blood or hair, avoiding then self-report^{20}.

In this simulation, it was determined that the DME of the exposure in a case-control
study produces positive bias, which is lower compared with the quantity of negative bias
(30 *versus* 70%). However, the positive bias is highly variable and
reaches unexpected sizes (as high as 50,000%). Another important finding of this study
is that the effect of the factors (sensitivity and specificity of the exposure
classification, the exposure prevalence among the controls, true OR) on relative bias is
completely different for positive bias than for negative bias. There were no
similarities between the two types of bias, and the bias seemed to be affected by all of
the factors at the same time and not independently, which would indicate a high degree
of interaction between them. These interactions are difficult to interpret using
graphics because they only describe two-order interactions. However, the linear models
revealed second-, third-, fourth-, and fifth-order significant interactions (data not
shown). Of the 57 possible interactions in the DME analysis, 22 (39%) were significant
for negative bias and 27 (47%) for positive bias, showing the degree of complexity in
this analysis in comparison to the NDME analysis, which only revealed two significant
interactions among the 11 possible interactions (18%).

Our analyses were based on the variation of the sensitivity and specificity of the
classification of an event of interest. However, we recognize that there are other
approaches which additionally include the predictive value of the classification methods
such as the one Marshall^{21} proposed. Even so, we
believe that the approach presented by us is useful for understanding the effects of the
limitations in terms of sensitivity and specificity and even more useful because
researchers could easily find explanations for the lack of associations or implausible
findings through systematic reviews about the limitations of the instruments used by
them. Although knowledge of prior information such as sensitivity and specificity of the
exposure classification could allow researchers to detected misclassification bias^{22} and correct it^{23}
^{,}
^{24}, we insist that is very useful to be aware
about the effects of misclassification and how these could be avoided if researchers
validate the methods of classification. As with any other type bias, it will always be
best avoid bias rather than having to correct it.

CONCLUSION

It is likely that a misclassification error will occur if the sensitivity and specificity of any instruments used to compare groups are not exactly 100%. If this type of error occurs, it is preferable if it is non-differential. In cases of NDME, there is equilibrium in the bias, and the explanation of the study results is easier. In a case-control study, it is important to pay close attention not only to the identification of cases and to the appropriate selection of controls but also to the classification of the exposure. Thus, the use of validated classification instruments is emphasized so that instruments with high sensitivity and high specificity are selected. The design of the instrument and the appropriate formulation of the questions on a questionnaire are key components for avoiding DME. It is therefore important that epidemiologists and researchers are close not only to the elaboration and design of surveys, but also to the monitoring of the quality with which these are applied as well as the quality control of other classification procedures and tests laboratory and/or diagnostic tests.