The estimated magnitude of AIDS in Brazil : a delay correction applied to cases with lost dates

The number of HIV-infected people is an important measure of the magnitude of the AIDS epidemic in Brazil and allows for comparison with epidemic patterns in other countries. This quantity can be estimated from the number of reported AIDS cases, which in turn needs to be corrected for the distribution of reporting delays and under-recording of cases. These distributions are unknown and must also be estimated from the recorded dates, which were missed to the Brazilian National AIDS registry. This paper estimates the number of AIDS cases diagnosed by imputing the lost information based on an estimate of the pattern in registration delay until 1996. We first fitted a non-stationary bivariate Poisson regression model to estimate the pattern in reporting delay. In the subsequent steps these models were applied to impute new data, thus replacing the missing information, and to estimate the magnitude of the AIDS epidemic in the country. Model estimates ranged from 36,000 to 50,000 AIDS cases diagnosed in Brazil and still unreported. Therefore, the epidemic was 20 to 30% greater than known from the available information as of February 1999. To be useful to health policy-makers, the surveillance system based on officially reported AIDS cases must be continuously improved.


Introduction
The AIDS epidemic in Brazil is monitored based on cases reported by the official surveillance system.Reporting is mandatory and is done by filling out a standard form.These data also help to plan the distribution of medicines, and to plan and evaluate preventive measures.It is thus indispensable to monitor the quality and consistency of existing information in such a complex and mutating epidemic.The HIV incubation period, already long since the beginning of the epidemic, was further increased by the introduction of combination anti-retroviral treatment.Since data published by the surveillance system refer to the number of infected individuals who have already developed AIDS, they tend to be even further from the actual number of HIV-infected individuals.
These data exclude not only individuals who are infected and asymptomatic but also cases that have never been reported (under-reporting) or those that will be reported late.To estimate the number of HIV-infected individuals based on the number of AIDS cases reported to the surveillance system, one needs an estimate of the number of AIDS cases at that moment, i.e., to correct for both reporting delays and estimated missing cases.Studies on under-reporting in Brazil are still limited to certain States of the country and certain periods of time and have utilized hospital or death records.They suggest under-reporting rates from 15% to 43.3% (Ferreira & Portela, 1999;Lemos, 1998).To correct for reporting delay one must estimate the lag pattern, i.e., the distribution of time transpired between case diagnosis and reporting.Various methods have been applied to this estimate since the beginning of the AIDS epidemic.They assume that diagnosis and reporting dates are properly recorded, and many use these dates to compose time intervals for diagnosis and delay, besides making corrections based on the adjustment of log-linear models (Brookmeyer & Damiano, 1989;Brookmeyer & Liao, 1990;Harris, 1990;Zeger et al., 1989).With other types of approaches, Sellero et al. (1996) formulated the problem of estimating delay distribution as one of survival analysis, calculating the delay as a number of days, while Brookmeyer & Gail (1994) made suppositions about the change in reporting behavior over time, adjusting a loglogit model through diagnostic periods for each delay interval.
In delay corrections conducted for the Brazilian case, Barbosa & Struchiner (1997a, 1997b, 1998) used 88,349 cases reported as of September 1996, of which only 70% contained the reporting date, and estimated the epidemic's magnitude by both region and exposure category, making the supposition that the dates were missed randomly.They compared the results obtained when the delay was measured in number of days and the correction performed on the basis of a survival model with those obtained from a Poisson regression when the cases were distributed in a contingency table formed by cross-analyzing semester of diagnosis with number of semesters in the delay.In evaluating the results, the authors highlighted the need for a model that incorporated the effect of the semester of diagnosis, since the reporting pattern could be affected by free distribution of combination anti-retroviral treatment.
According to these evaluations, in order to update the delay corrections and estimate the magnitude of the Brazilian epidemic, models were needed that considered the hypothesis of non-stationary delay.Nevertheless, due to a technical problem, and this time in non-random fashion, an even larger amount of case-reporting data was lost from the database of the National Disease Reporting Information System (SINAN).More recent studies have discussed various aspects related to the effect of correcting reporting delay, when one considers the concrete situation of surveillance system data.Thus, Law & Kaldor (1997) propose to take into account the way in which time lag is measured, and Gebhardt et al. (1998) evaluate the effect of including non-stationarity in the models when comparing incidence in various countries.Becker & Kui (1997) analyzed the effect of including AIDS data prior to the beginning of registration of reporting dates.Others have proposed that the analytical approach assuming delay stationarity throughout the period was responsible for distortions in estimates of AIDS cases in recent periods (Gebhardt et al., 1998;Harris, 1990;Lindsey, 1996).
In order to estimate AIDS cases already diagnosed but still not reported in Brazil as of late 1998, this study attempts to deal with the above-mentioned losses of reporting dates.The idea was thus to seek a model to correct the delay as of 1996, using reports occurring until February 1999 as one of the model's elements for evaluation.Based on information available in the delay registry as of June 1996, a first approximation was generated for the relevant amounts, based on a statistical model taking non-stationary behavior into account and following the approach proposed by Lindsey (1996).The results were used to impute the information missing from 1996 to 1998.An-other iteration was added to these values, leading to the forecast magnitude of the epidemic in 1998.

Database and Epidemiological Bulletin
There are two principal sources of data in Brazil allowing one to infer the behavior of the AIDS epidemic.One source, the SINAN Database, involves primary data, while the other involves secondary data: the Epidemiological Bulletins published regularly since the beginning of the epidemic and consisting of the periodical totalization of data from the base by place of residence, State (including the Federal District), year of diagnosis, exposure category, age bracket, sex, etc.The dates pertaining to the various events characterizing the epidemic (AIDS disease, reporting, and death) are essential as a source of primary information and serve to construct various epidemiologically relevant measurements.This information allows us to establish retrospective cohorts that serve as a starting point to estimate, for example, distribution of HIV incubation time, distribution of AIDS survival time, or distribution of reporting delay, necessary to correct the AIDS incidence at any specific moment.Making this correction means estimating the size of the HIV-infected cohorts, which are constituted based on the AIDS diagnosis and whose event of interest is the reporting date.With the lack of information concerning the reporting date that occurred Table 1 describes the number of AIDS cases by year of diagnosis from 1986 to 1997 and published in the August Bulletins each year.These data demonstrate that the number of new reported cases increased from some two thousand in diagnostic year 1986 to 16 thousand in diagnostic year 1997.Note that since 1992 Brazil has used two different criteria to define an AIDS case.The so-called Rio de Janeiro/Caracas criterion is based on a scoring system for signs and symptoms, plus positive HIV serology.The second criterion is a modified version of the definition proposed in 1988 by the Centers for Disease Control and Prevention (CDC) that included encephalopathy and the cachectic syndrome (CNDST/AIDS, 1994).This new criterion expands the range of diseases and may at least partially explain the increase in the number of cases.The data also illustrate the importance of delay correction.For example, the August 1994 Bulletin included some 11 thousand cases diagnosed in 1992 and 11.5 thousand diagnosed in 1993.By August 1998 these same figures had already increased to some 14 and 16 thousand cases, respectively, reflecting an epidemic with seven thousand cases more than published five years previously, for these two diagnostic years alone.The implications for evaluating the impact of HIV in- fection, planning control measures, and estimating the epidemic's magnitude are easy to perceive.

Presentation of the problem
All AIDS cases diagnosed and already reported by December 1998 (Y obs ) with their respective diagnosis and reporting dates constitute the complete database, necessary to conduct the delay correction.This database is not available due to the loss of reporting dates already referred to.This database would allow one to form a contingency table, the dimensions of which would refer to the semester of diagnosis and the number of semesters in delay.This table is illustrated in Figure 1.This table would have many empty cells (Y cens ) due to cases still not reported because of the time lag since diagnosis (censure to the right).The objective of the reporting delay study was thus to estimate the AIDS cases already diagnosed and still not reported in each year (T est ), i.e., to estimate the empty cells in this table, revealing the magnitude of the AIDS epidemic based on the estimated marginals of the incomplete lines (T inc ).To conduct this estimate, we simultaneously took into account the delay pattern and the incidence of diagnoses by semester.However, due to the miss of reporting dates beginning in June 1996, it was only possible to construct the table presented in Figure 2, where from the twenty-first semester of observation onward (Y miss ) we lacked information on the cases diagnosed by delay and thus where the reporting delay pattern was unknown, but where the number of cases reported and published in the Epidemiological Bulletin gave us the total cases diagnosed per year of diagnosis (T bul ).

Methods
The methodological problem can be described as a statistical prediction based on censured and missing data.The magnitude of the AIDS epidemic in December 1998 was thus estimated by data modeling and imputation in three phases: • Phase 1: Using the data for diagnoses already reported as of December 1995 and indicated in Part I of Figure 2 (Y obs ), we sought a model that would estimate the unreported diagnosed cases (Y cens ), approaching them to the cases published in the February 1999 Bulletin with diagnosis prior to that date (T bul ).
• Phase 2: The model developed in the previous phase was used to impute Part II of Figure 2 (Y miss ) and then to estimate the delay pattern from 1996 to 1998, excluding the estimates of censured cells (Y cens ).This delay pattern was applied to the annual diagnoses already reported and published in the February 1999 Bulletin (T bul ).Thus, each cell Y ij in the table was obtained by performing Y ij = p ij * T j , where p ij represents the proportion of diagnoses in year i reported with j semesters of delay obtained by the model and T j is the total diagnoses already reported.
• Phase 3: The same model was applied to the data generated in Phase 2, obtaining new parameters, which were then used to estimate the AIDS cases already diagnosed and still not reported as of February 1999 (T est ).

Suppositions for the model
The search for a model took into account the evaluations conducted in the estimates obtained by Barbosa & Struchiner (1998) when confronted with the reality obtained in the Epidemiological Bulletins.The model was thus based on several premises: • Models previously used in Brazil failed to capture the change in the epidemic's behavior beginning in 1993 and thus underestimated the epidemic's magnitude in recent years.
• Missing reporting dates prior to 1996 were presumed to be random.
• The supposition in previous corrections, i.e., that delay was negligible after four years, was untrue.
• Free distribution of combination antiretroviral treatment altered reporting behavior.

Results
After adjusting various models proposed and studied by Lindsey (1996) and testing various transformations of the variables pertaining to delay and diagnosis time, two models were chosen.The first, a non-stationary model (Model 1) where, beginning in 1993, the delay interacted the reciprocal of diagnosis time cubed with the logarithm of this same time: where t is semester of diagnosis and u the number of semesters' delay.AIDS cases Y tu observed in t and reported u semesters later were first considered as coming from a bivariate Poisson distribution with parameter λ (t, u), and thus a log-linear model was adjusted based on the GLM function of S-Plus (Statistical Science, 1993).However, there were indications that the case variance was much greater than that predicted by the Poisson model, and since it is known that this leads to underestimation of standard errors in parameters, a quasi-likelihood function was used with constant variance, without specifying the data's distributional form (Demetrio & Hinde, 1998).
Selection of the second model (Model 2) was based on the parameters of the previous model and the attempt to obtain a more parsimonious but also non-stationary model: In addition to the usual statistical criteria for choice of models based on goodness-of-fit measurements and residuals analysis, we also used the best approximation to the AIDS diagnoses published in the February 1999 Epidemiological Bulletin.Note that the fit in these models was performed using only the data reported as of June 1996.We applied the delay pattern found by the models to the annual diagnoses already reported and published in the February 1999 Bulletin (T bul ) as explained above.Using these data, we fitted the previous models again and obtained other parameters that allowed us to construct two tables pertaining to the 155,689 cases reported as of February 1999, completing the information lost from delays with that obtained through the two models described above.
Table 2 shows the results of the totalization (by year of diagnosis) obtained from the models' estimates, as well as the real observations published in the February 1999 Bulletin.According to these estimates, by the end of 1998 Brazil had 36,000-50,000 AIDS cases diagnosed and still not reported, i.e., a total of some 190,000-200,000 cases already diagnosed rather than the 154,000 reported as of February 1999.

Discussion
Problems with the recording of AIDS reporting dates in Brazil have impeded delay correction with either the frequency or accuracy needed.Lack of correction hinders estimates of the number of HIV-infected individuals performed with methods using official AIDS reports.This study attempted to show that the lack of delay correction can also significantly distort the planning and evaluation of resource alloca- Y cens = number of cases diagnosed in each semester t and whose reporting delay will be greater than the observation time.
T inc = Total cases diagnosed each year and already reported.T est = Total cases diagnosed each year including estimates of those that will still be reported.tion for hospital care.The results of this study should be viewed as a temporary approach to the problem of missing dates and estimation of the epidemic's magnitude, since the definitive solution to this problem requires application of date imputation methods to the SINAN Database.Barnard & Meng (1999) applied the multiple imputation methods developed by Gelman et al. (1995) to various databases, one of which aimed to impute the dates of AIDS deaths.
Despite the limitation of this study, which cannot be replicated to other situations, note that the estimates indicating a growing epidemic may merely be reflecting aspects of surveillance that alter the reporting pattern.Among such aspects are the decrease in under-reporting, changes in official diagnostic criteria, or changes in actual diagnostic criteria which may have been altered since 1996 to facilitate patients' access to anti-retroviral treatment.AIDS case reporting is still very important in Brazil for those evaluating and monitoring the epidemic, but it is already clear that a combined prevalence of those living with a diagnosis of HIV infection and those living with AIDS would provide a more realistic and useful estimate of necessary therapeutic resources.This highlights the need for official monitoring of HIV cases in Brazil, recently made official in the United States by the CDC (MMWR, 1999).Such monitoring will make more precise estimates of the effects of medication on incubation and survival time, in addition to allowing for evaluation of preventive campaigns.
ESTIMATED MAGNITUDE OF AIDS IN BRAZIL281since 1997 in the SINAN Database, it was necessary to use data totalized by year of diagnosis and published in the Epidemiological Bulletins as an ancillary instrument to infer the cohorts' behavior.

Table 1
AIDS cases according to year of diagnosis in the August 1993 to 1998 Bulletins.

Table possible
inc = Total cases diagnosed each year and already reported.T bul = Total cases diagnosed each year since 1996 and published in the February 1999 Bulletin.T est = Total cases diagnosed each year, including estimates of those that will still be reported.Cad.Saúde Pública, Rio de Janeiro, 18(1):279-285, jan-fev, 2002

Table 2
AIDS cases diagnosed until 1998 and estimated by the two models and compared to published data.