Prevalence of corrected arterial hypertension based on the self-reported prevalence estimated by the Brazilian National Health Survey

The objective was to correct the self-reported prevalence of systemic arterial hypertension (SAH) obtained from the Brazilian National Health Survey (PNS 2013). SAH prevalence estimates were corrected by means of sensitivity/specificity of information. Sensitivity and specificity values from a similar study (same self-report question, age range and gold standard) were used to this end. A sensitivity analysis was also performed, by using the upper and lower limits of confidence intervals as sensitivity and specificity parameters. The corrected prevalence of SAH for Brazil as a whole was 14.5% (self-reported: 22.1%). Women presented a higher rate of self-reported SAH but, after correction, men were found to have a higher prevalence. Among younger women (18-39 age range), the self-reported prevalence was 6.2%, a value that, after correction, dropped to 0.28%. There was not much difference between self-reported and corrected SAH among the elderly (51.1% vs. 49.2%). For certain groups the corrected results were greatly different from the self-reported prevalence, what may severely impact public health policy strategies.


Introduction
Cad. Saúde Pública 2020; 36(1):e00033619 selection of homes in each tract; and (3) selection of one person aged 18 years or over in each home, by means of simple random sampling. A total of 60,202 people were thus interviewed. Further information on the PNS sampling scheme can be obtained from Souza-Júnior et al. 21 .

Correction of prevalence
Correction of SAH prevalence can be done algebraically using 10 : where: p r = real prevalence (corrected); p a = self-reported prevalence; Sp = specificity; and Se = sensitivity.
However, this solution is not unrestrictedly applicable to every sensitivity and specificity value, being limited to the interval 1 -Sp ≤ p a ≤ Se (the complement of specificity and the sensitivity). If this condition is not respected, the solution will present prevalence results that are negative or greater than 1.
With the aim of dealing with this problem, Lew & Levy 11 proposed an adjustment to the formula above, such that the correction would only result in possible and interpretable results. Therefore, this estimator ensures that for any self-reported prevalence values, a correction will be possible. Essentially, this strategy consists in replacing the self-reported prevalence in the previous expression with d: where: n = size of the sample; and x = number of self-reported subjects with the condition. Therefore, the corrected prevalence is given by: Notwithstanding the analytical solution, the calculation for d is not immediate and depends on specific software, capable of complex numerical integration. Moreover, this integration presents a computational limitation relating to the sample size, making it impossible the use of computers when the sample is large (usually greater than 1,000 cases). Therefore, a simplification has been suggested, taking into account a Bayesian method for large samples. This method consists in proportionally reducing the sample size and the number of people with the condition, until reaching the maximum number that can be calculated using the available hardware/software 22 . Thus, in order to calculate the 95% confidence interval (95%CI) for corrected prevalences, the approximation suggested by Lew & Levy 11 was used: where: and SE = standard error.
However, data for the present study was originated from a complex sample, and therefore the design effect (deff) needs to be taken into consideration in order to incorporate the estimate loss of precision. The design effect is the ratio between the estimate of the variance from the sampling level actually used and the estimate of the variance if it had been obtained through a simple random sample of the same size. Thus, the variance of the estimate for the corrected prevalence is multiplied by the design effect, which is obtained from the survey data with the sampling strategies mentioned before. Therefore, the standard error that takes into account the design effect follows equation (5): where: SE = standard error; and deff = design effect.

Sensitivity and specificity of the question
Correcting the self-reported prevalence of an event demands information on both its sensitivity and specificity, measurements that (e.g.) can be obtained from similar studies in the literature (similar populations, methods, measuring equipment and survey types). Among the four available Brazilian articles on the validation of self-reported SAH, Lima-Costa et al. 12 used the same question as the PNS, had subjects above 18 years and used a table sphygmomanometer as gold standard. Therefore, that study provided the basic measurements needed for corrections (general and age-related sensitivities and specificities -see Table 1). Combined values for sex and age were obtained from the article raw data (available from the authors).

Variables used
The question used for diagnosing self-reported prevalence was: "Has a doctor or other healthcare professional ever told you that you have high blood pressure or hypertension?" (variable Q004). There were three response categories: "Yes"; "Yes, only during pregnancy" (only for women) and "No". Women who reported SAH only during pregnancy were included in the category "No".

Statistical analysis
The prevalence of self-reported SAH was estimated for the population as a whole, according to sex and to age group. Cases in which no information on self-reported SAH was available were excluded from the analysis. In addition, prevalences were also corrected by taking into account the upper and lower values of the 95%CIs for sensitivity and specificity. The adjusted expression for the Bayesian estimator and its adaptation for large samples were used in cases in which the condition 1 -Sp ≤ p a ≤ Se was not met.
The Maple v.5 software (https://www.maplesoft.com/) was used for integration and other algebraic manipulation. Table 2 presents the prevalence of self-reported SAH in Brazil according to sex and age, from the PNS 2013, together with the corrected prevalence values developed in the present study. The adjusted expression for the Bayesian estimator and its adaptation for large samples were used among women Table 1 Sensitivities and specificities found in the validation study on overall self-reported arterial hypertension, according to sex and age group, adapted from Lima-Costa et al. 12 . aged 18-39 years, among subjects 18-39 years as a whole and for the 95%CIs for sensitivity and specificity among men and women 18-39 years and women 40-59 years. In some categories, large differences between the self-reported and corrected values for SAH could be seen. Across Brazil, regardless of sex and age, the prevalence of corrected SAH was 14.5%, 7.6 percentage points lower than the selfreported value (22.1%). The corrected prevalence for men did not change much, but became higher than that of women (19.5% among men vs. 11.8% among women). This sex difference was especially visible among non-elderly people. Among males, the corrected prevalence in the 18-39 age group was 0.9%, and, in the 40-59 group, 20.6%. Among elderly men, corrected prevalence increased from 45.8% to 51.7%. Among women, selfreported prevalence was more than three times higher in the age group 40 to 59 years (31% vs. 9.3%). Regardless of sex, self-reported SAH was overestimated in all age groups, but overestimation error decreased with increasing age ( Table 2).

18-39 years
As expected, lower prevalences were found when combining the upper and lower limits for sensitivity and specificity (and higher prevalences in the opposite case). Table 3 presents these prevalences. The new prevalence interval varies from 10.8% to 18.5%, and in the age group 18 to 39, a variation between 0.1% and 20.7% could be seen among men. Among women in the same age group, this range was much smaller, going from 0.1% to 1.1%. In the age group 40 to 59, a larger variation for the corrected prevalence also was seen among men, from 2.4% to 43.3%, and, among the elderly, this range was larger among women, whose corrected prevalence varied from 3% to 68.5%.

Discussion
Knowledge on the real magnitude of a disease in a specific population, for instance estimated by correcting self-reported prevalence, is extremely relevant for public health purposes, and Brazilian validation studies on SAH have shown that self-reported prevalence of SAH is usually overestimated by 10% to 15% (without stratification), with larger variations according to sex and age group 12,14 . The present study indicates that the prevalence of SAH is really overestimated, such that in some categories the self-reported magnitude may even be twice the real prevalence.
It is interesting that in the age group 40 to 59 (a group frequently targeted for health campaigns), self-reported prevalence was more than twofold overestimated. On the other hand, among elderly people (over 60 years of age), overestimation was only 4%. The only category in which self-reported prevalence was underestimated was males, but the degree of underestimation was small.
A validation study in Pelotas (Brazil) 14 , also found that SAH self-reported prevalence was underestimated among men. However, among women, the self-reported prevalence of SAH was overesti- Table 2 Prevalences of self-reported and corrected systemic arterial hypertension (SAH) in Brazil, according to sex and age group, from the Brazilian National Health Survey (PNS), 2013.  3 . Despite the need for such corrections, only one other study could be found (osteoarthritis in France) in which self-reported prevalence was corrected by means of sensitivity/specificity information. In that case, the authors found that the prevalence was underestimated when self-reported measurements were used (7.9% for self-reporting; 9.1% for the corrected estimates) 25 .
As mentioned, the present study used sensitivity/specificity values from another study 12 in order to obtain corrected estimates for self-reported SAH in Brazil as a whole. Although the populations studied in Lima-Costa et al. and here are not specifically the same, it should be noted that both studies used the same question for ascertaining SAH, included subjects above 18 years old and used the same gold standard for validating SAH. These similarities (and the fact that questions were asked in the same language) guarantee a degree of methodological consistency for the use of those estimates. Nevertheless, a limitation of the present study is that further validation should be sought using sensitivity/specificity values from more recent/more comprehensive data, including different regions of the country. Also, in the present study, prevalences were corrected by simulating different combinations of sensitivity and specificity, taking into account their lower and upper confidence interval limits. Although this is an interesting strategy for the inclusion of uncertainties, it does not consider the plausibility of the results and should be considered as a "worst case" scenario, since it analyzes the combinations of the upper and lower limits of sensitivity/specificity as if these values had the same likelihood to occur. Therefore, an excessively pessimistic or conservative image of the results might be obtained 9 . Another means of obtaining representative and plausible sensitivity/specificity values would be through a meta-analysis, in which all the articles validating the self-report question would be included.
This study presented the corrected prevalence of SAH in Brazil, according to age and sex, taking as its basis the sensitivity and specificity values of a self-report question posed in 2013. The resulting estimates are therefore closer to the real prevalence, and it was observed that, in all categories except men, the prevalence of SAH was overestimated when the subjects were asked about the disease. In addition, the corrected values were closer to and in the same direction of worldwide estimates for the prevalence of SAH. This result is extremely important, since it would enable the formulation of public policies that take into account the proportion of individuals in the Brazilian population that actually have this condition. LLSe: lower limit of the 95% confidence interval for sensitivity; LLSp: lower limit of the 95% confidence interval for specificity; ULSe: upper limit of the 95% confidence interval for sensitivity; ULSp: upper limit of the 95% confidence interval for specificity.

Contributors
The authors contributed equally to this work.