Use of capture-recapture analysis in data sources Uso de análise de captura-recaptura em fontes de dados

I have read with much interest your recent article on estimating the incidence of visceral leishmaniasis with a relatively simple two-source capture-recapture analysis through Chapman’s Nearly Unbiased Estimator, which is only valid when data sources are independent, among other underlying assumptions.6 Despite describing most assumptions in detail, regarding one of the most important assumptions, namely the source independence assumption, in the Discussion the authors only mention that they have “verifi ed” that the data sources, as used in this study, are generally independent but do so without further explanation. They also state that it is not possible to determine the level of pair-wise interdependence (i.e. the probability of being observed in one register is affected by being –positive dependence[or not being –negative dependence-] observed in another register) in a two-source capture-recapture analysis. I assume the authors mean that the data, as presented in the Venn-diagram in the fi gure, allow for investigation of possible pair-wise interdependencies by means of three-source capture-recapture analysis through loglinear modeling.1,3-5,8-10 The use of death certifi cates as a third source of information allows for the probability of patients being reported to the notifi cation and/or hospital admission register, as shown in the fi gure, although intuitively it could create a negative interdependency with the other registers. The preferred log-linear model (Table), using the (lowest) Akaike Information Criterion, and incorporating interaction terms between SIM*SIH and SINAN*SIH, gives an estimate of 10,842 (95% confi dence interval 10,120;11,754) patients with visceral leishmaniasis. Internal validity analysis by the two-source capture-recapture method shows that there is some relevant positive interaction between the SINAN and SIH registers as the estimate of 10,207 cases is slightly lower.2 Two-source capture-recapture analysis using the SIM and SIH registers estimates 12,288 (95% CI 10,358;14,395) cases, indicating a relevant negative interaction between these registers, resulting in an overestimate. The two-source analysis, using the interaction term not identifi ed as signifi cant in the three-source analysis, namely between SINAN and SIM, estimates 10,691 cases, very close to the selected three-source capture-recapture estimate. An alternative population estimation model, the truncated binomial model, estimates 12,197 cases, but the high coeffi cient of variation of the data source probabilities (0,71) indicates violation of the equiprobability assumption underlying the binomial model, analytically resulting in overestimation.11 The coverage of SINAN according to the three-source capture-recapture analysis (5,896/10,842 cases) is 54,3%, close to the author’s estimate of 55%, as expected. In general one should be

intuitively it could create a negative interdependency with the other registers.The preferred log-linear model (Table ), using the (lowest) Akaike Information Criterion, and incorporating interaction terms between SIM*SIH and SINAN*SIH, gives an estimate of 10,842 (95% confi dence interval 10,120;11,754) patients with visceral leishmaniasis.Internal validity analysis by the two-source capture-recapture method shows that there is some relevant positive interaction between the SINAN and SIH registers as the estimate of 10,207 cases is slightly lower. 2 Two-source capture-recapture analysis using the SIM and SIH registers estimates 12,288 (95% CI 10,358;14,395) cases, indicating a relevant negative interaction between these registers, resulting in an overestimate.The two-source analysis, using the interaction term not identifi ed as signifi cant in the three-source analysis, namely between SINAN and SIM, estimates 10,691 cases, very close to the selected three-source capture-recapture estimate.An alternative population estimation model, the truncated binomial model, estimates 12,197 cases, but the high coeffi cient of variation of the data source probabilities (0,71) indicates violation of the equiprobability assumption underlying the binomial model, analytically resulting in overestimation. 11The coverage of SINAN according to the three-source capture-recapture analysis (5,896/10,842 cases) is 54,3%, close to the author's estimate of 55%, as expected.In general one should be cautious relying on two-source capture-recapture estimates alone for estimating the completeness of reporting of infectious diseases, as relevant (often positive) interdependencies between data sources used, such as notifi cation, laboratory or hospital registers, could result in (untested and therefore unobserved) biased estimates.
This further, less mysterious, explanation of the preferred methodology for this study, i.e. three-source capture-recapture analysis, should be useful to the readers.Capture-recapture analysis, as a method to estimate infectious disease incidence and completeness of registration, is often not the cheap, quick, simple and reliable method as once advocated. 7

Consultant Tuberculosis Control Physician/ Epidemiologist Tuberculosis Control Section
Municipal Public Health Service Rotterdam-Rijnmond, Netherlands vanhestr@ggd.rotterdam.nl It is important to stress the need to take into account the assumptions required for the application of capturerecapture analysis (close population, effective link of registers, diagnosis validity, and source independence).When these assumptions are violated, i.e., when any assumption cannot be fully met, information should be detailed and discussed.Because of this condition, many investigators question the application of capturerecapture analysis in epidemiological studies.
In the study of registers of visceral leishmaniasis information, the authors proposed a combination of pairwise sources using a simple approach for the intended estimates since data sources analyzed were the same as used in studies of other diseases in Brazil 5 and due to easiness of its application in health care services.They also pointed out that as they chose to study pair-wise sources, they did not determine the level of source interdependence.But the authors are aware that log-linear models and Bernoulli census are approaches for the treatment of source interdependence. 2,3In his letter to the editor, Van Hest presents a demonstration of source interdependence based on a log-linear model.It should be noted though that an analysis without the adequacy of data sources can lead to inaccurate conclusions.
It should also be underlined that the article "Analysis of visceral leishmaniasis reports by the capture-recapture method" had no intention to either exhaust this subject or claim that capture-recapture analysis is the best choice for disease monitoring by epidemiological surveillance systems.Yet the article provides relevant information for leishmaniasis surveillance in Brazil and promotes the debate on the application of capturerecapture analysis in epidemiology.

Table .
The eight possible three-source log-linear capture recapture models with the likelihood ratio (G 2 ), P-value, degrees of freedom (df), Akaike Information Criterion (AIC) and estimated number of visceral leishmaniasis patients (N est ) and the 95% confi dence interval (95%CI).